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Abstract 



The work presented in this thesis focusses on dealing with timing 
covert channels in dynamic information-flow control systems, particu- 
larly for the LIO library in Haskell. 

Timing channels are dangerous in the presence of concurrency. There- 
fore, we start with the design, formalisation and implementation of a con- 
current version of LIO which is secure against them. More specifically, we 
remove leaks due to non-terminating behaviour of programs (termina- 
tion covert channel) and leaks produced by forcing certain interleavings 
of threads, as a result of affecting their timing behaviour (internal tim- 
ing covert channel). The key insight is to decouple computations so that 
threads observing the timing or termination behaviour of other threads 
are required to be at the same confidentiality level. This work only deals 
with internal timing that can be exploited through language-level opera- 
tions. We also mitigate leaks that result from the precise measurement of 
the timing of observable events (external timing covert channel), e.g. by 
using a stopwatch. 

Timing channels can also be exploited through hardware-based sha- 
red resources, such as the processor cache. This thesis presents a cache- 
based attack on LIO that relies on timing perturbations to leak sensi- 
tive information through internal timing. To address this problem, we 
modify the Haskell runtime to support instruction-based scheduling, a 
scheduling strategy that is indifferent to such perturbations from under- 
lying hardware components, such as the cache, TLB, and CPU buses. 
We show this scheduler is secure against cache-based internal timing 
attacks for applications using a single CPU. Additionally, we provide a 
purely language-based implementation of the instruction-based strategy 
for LIO, by means of a library. We leverage the notion of resumptions, 
a restricted form of continuations, to control the interleaving of threads, 
forcing each thread to yield after every LIO operation. Due to the flex- 
ibility of this approach, we are able to support parallel computation in 
the library, a novel feature in information-flow control tools. 

Finally, we present a new manifestation of internal timing in Haskell, 
by exploiting lazy evaluation to encode sensitive information as timing 
perturbations. We illustrate our claim with a concrete attack on LIO that 
relies on memoisation of shared thunks to leak information. We also 
propose a countermeasure based on restricting the implicit sharing of 
values. 
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ONE 



INTRODUCTION 



There is no arguing that Computer Science is one of the driving forces 
behind innovation and development in the modern world. No other sci- 
ence has ever managed to transform the world as a whole in such a 
radical way as computing technology has during the last half of the 20th 
century. Our lives changed dramatically as we entered the so-called In- 
formation Age, and we started to become more and more reliant on com- 
puters for everything, including critical tasks in our society such as man- 
aging the social security system or the banking system. Information, and 
the way it disseminates, is a crucial part of this infrastructure. 

Nowadays, personal information has become a valuable commod- 
ity. Many people own a smart phone, where they can install and use 
apps, and access social media websites. These apps are usually given ac- 
cess to potentially sensitive information such as contacts, text messages, 
and notes. Leaking sensitive information to third-parties can have seri- 
ous consequences for the lives of the users, so it is necessary to develop 
mechanisms to secure this information and control its propagation. The 
most widespread approach is known as access control, where the user 
must give explicit permission to the application to access sensitive in- 
formation or functionalty. Once this access has been granted, there is 
no way of knowing how the application will use this information, and 
where it is propagated. For example, an app with both read access to the 
phone's contacts and Internet access might send the contacs to a server 
on the Internet without explicit consent from the user. 

Information flow control (IFC) [14] is an alternative to access control 
that tracks how information is disseminated in a given program, and 
ensures that it is used according to a given policy. This thesis focusses 
on dynamic information flow control, which involves enforcing security at 
runtime by checking all potentially insecure operations as they are per- 
formed by the program. When such an operation occurs, the program 



2 



execution is stopped in some way, for example by simply aborting the 
program or, in some cases, by throwing a runtime exception. 

Lately, concurrency has become a necessity for practical applications. 
In the last decade, multi-core processors have become commonplace, so 
programmers expect to leverage this capability by writing multi-threa- 
ded programs. However, in the context of an information-flow control 
system, naively adding concurrency introduces a new possibility to leak 
information through covert channels [8], i.e. leaking information by ex- 
ploiting system features not intended for communication. 

This work is developed in the context of a specific kind of dynamic 
enforcement, based on a floating-label approach, which borrows ideas 
from the operating systems security research community [19], and brings 
them into the field of language-based security. The main example of 
such an enforcement is LIO, a Haskell library for dynamic information- 
flow control which allows programmers to write programs with security 
guarantees. 

An information-flow-aware system is usually pictured as having in- 
formation that concerns a number of agents or principals. An information- 
flow policy specifies how these principals are related to each other, specif- 
ically in terms of how information is allowed to flow among them. There 
are two sides to information flow control: confidentiality and integrity 
of data. In the most typical scenario, we are mainly interested in con- 
fidentiality, i.e. ensuring that secret information is not visible to unau- 
thorised principals. Information-flow control aims to provide end-to-end 
security [15]. 

The main contributions of this thesis revolve around extending LIO 
with concurrency, while protecting against timing covert channels. In 
what follows, we provide a brief overview of IFC, timing covert chan- 
nels and LIO. 

1 Information flow control 

Information flow control first arose from the need to track the propaga- 
tion of information in military contexts. The classic scenario for infor- 
mation-flow control is a system which contains both secret and pub- 
lic information, and we want to ensure that the public outputs of the 
program cannot be influenced by secret information. One way of en- 
forcing this is to think of a program as having endpoints (inputs and 
outputs) where information is consumed and produced. A label is at- 
tached to each of these endpoints, which indicates whether the informa- 
tion at that point is public or secret. Whenever the program attempts a 
write operation into a public output, the information-flow control sys- 
tem must check whether the information that is being written comes 
from (or, more generally, depends on) any secret input. If that is the case, 
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the program is in violation of the security policy, and therefore consid- 
ered insecure. 



1.1 Policies 



Government 



Hospitals 



Insurance 



Public 



Fig. 1. Security lattice 



A policy is formalised as a relation 
among the security levels in the 
system, which specifies how in- 
formation is allowed to flow be- 
tween different levels. Typically, 
it is defined as a lattice struc- 
ture [4] which induces an order- 
ing relation, usually written C. In 
general, li C l 2 means that in- 
formation from level h is allowed 
to flow to level l 2 - The canonical 
example is the two-point lattice 
with two levels, L and H, which 

respectively stand for low (public) and high (secret), and where the only 
allowed flows are L C L, H C H , and L C H . In general, the elements 
of the security lattice are used by the enforcement mechanism as labels 
for the information flowing through the system. The lattice elements can 
also be interpreted as actors /components in the system rather than just 
levels of confidentiality, which allows to express policies in a mutual dis- 
trust scenario [10, 17]. Fig. 1 shows an example of such a security lattice. 
The arrows indicate allowed flows. This policy allows public informa- 
tion to flow to hospitals (Public C Hospitals) and insurance companies 
(Public C Insurance), but it does not allow medical records in the hospi- 
tal to be disclosed to the insurance companies. 



1.2 Security property 

In this setting, the kind of security properties we would like to guaran- 
tee are known as noninterference [6] properties. They can be informally 
stated in terms of a general malicious entity (the adversary), which has 
the ability to observe data below or at a given security level, but would 
like to observe confidential information not below this level. Then, we 
consider two independent runs of a program with inputs that are in- 
distinguishable to the adversary, i.e. they only (potentially) differ in the 
parts that the adversary cannot see. We say that the program in question 
is noninterfering if the observable effects of these runs (outputs, return 
values, etc.) are also indistinguishable, as far as the adversary is con- 
cerned. 
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1.3 Enforcement mechanisms 

In this thesis, we take a dynamic apprach to IFC. In a dynamic IFC sys- 
tem, the program is run alongside an execution monitor, a software com- 
ponent that is in charge of supervising the operations performed by 
the program (input /output in general) and checking that they comply 
with the security policy The monitor will interrupt the execution of 
the program if a forbidden flow is detected. One example of a tool for 
dynamic IFC is JSFlow [7], an information-flow-aware JavaScript inter- 
preter which runs as a browser plugin. 

The alternative to the dynamic approach is static information flow con- 
trol [18], which consists in statically analysing a program, just by exam- 
ining its text, and classifying it as either secure or insecure depending 
on how information is propagated by it. There are several examples of 
static IFC systems, such as Jif [11] and Paragon [3]. They are both exten- 
sions to the Java language which allow programmers to express security 
policies, which are enforced using a type system. 

2 Timing covert channels 

Covert channels arise when information is leaked through mechanisms 
that were not originally designed for that purpose [8]. For example, the 
execution time of a program, the number of open files, or even the cur- 
rent volume of the speakers can be used by malicious programs to con- 
vey information to each other. In particular, timing channels affect the 
timing of programs to cause observable events to depend on secrets. 
Covert channels are, in some cases, easy to exploit if the adversary has 
access to the source code of the program, and especially so if the adver- 
sary is the one who writes it. 

The main focus of this thesis is the internal timing covert channel [16]. 
This is a timing channel that exploits the interleaving of threads in a 
concurrent system to make the outcome of data races depend on sensi- 
tive information. The adversary can learn some of this information by 
observing these outcomes. In the rest of the thesis, the way in which the 
threads can encode bits of the secret into a data race usually depends on 
shared resources among the threads. Note that we do not assume the ad- 
versary to have the ability to precisely measure time when considering 
this channel. 

The channel where the information is conveyed by the measurement 
of time (using a stopwatch) is known as external timing covert channel. 
Termination can be regarded as a special case, where the program takes 
an infinite amount of time to produce an output, and this fact can be 
used by the adversary to obtain sensitive information. 

It has been shown that termination and timing channels are capa- 
ble of leaking a considerable amount of information in a concurrent set- 
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ting [1]. For this reason, care must be taken when adding concurrency 
to an IFC system. In this work, we start with an existing dynamic IFC 
system, LIO, and show how to make it secure against certain classes of 
timing covert channels. The following section is a brief introduction to 
this IFC system. 

3 LIO 

LIO, which stands for Labelled IO, is a dynamic information flow control 
system for Haskell, a purely functional language with strong static typ- 
ing [12]. Purity, or the absence of side-effects, means that pure code is 
only vulnerable to flows of information from parameters to return val- 
ues. Additionally, the effectful part of an LIO program is written in an 
embedded language for describing computations, which allows direct 
control over all side-effects performed by the program. Unlike main- 
stream programming languages, where any effects are allowed anywhere 
by default, Haskell in principle disallows effects everywhere, except for 
special parts of the program where effects are explicitly marked by the 
type system. These blocks are written using monads [9], an abstract data 
type for specifying and combining effectful computations. As can be 
seen from earlier work [13], this makes Haskell a suitable language for 
information-flow analysis. LIO leverages this functionality to restrict the 
side-effects that the program can perform in order to enforce security. 

LIO is embedded in Haskell as a library: programmers write their 
programs using the LIO interface, and their execution will also perform 
security checks to enforce a given policy. This library provides secu- 
rity guarantees in the form of noninterference properties, in the sense 
that every valid LIO program is noninterfering by construction (mod- 
ulo covert channels). LIO is also parameterised in the security policy, 
which is specified as a lattice over a type of security labels. 

LIO uses a floating-label approach to information flow control. An 
LIO computation has a current label attached to it, which is an upper 
bound on the sensitivity of the data in scope. When a computation with 
current label L c observes an object A with label La, its current label 
must change (possibly rise) to the least upper bound or join of the two 
labels, written Lc LI L^. The current label effectively "floats above" the 
labels of all objects it observes. When performing a side-effect that will 
be visible to label L, LIO first checks that the current label flows to L 
{Lc E L) before allowing the operation to take place. 

Fig. 2 shows a basic example of how LIO works, assuming the se- 
curity lattice from Fig. 1. The code is the definition of a malicious func- 
tion steallnfo, which attempts to steal confidential medical information 
and send it to an insurance company. The function takes one argument, 
medicalRecord, which has the label Hospital, and is supposed to contain 
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the medical record of a person. This is an example of an LIO labelled 
value, which is simply a value protected by a security label. Labelled 
values must be unlabeled using the unlabel primitive, which returns 
the value itself and raises the current label of the LIO computation ac- 
cordingly. In this example, we assume that the LIO computation has the 
label Public as its current label. After the first line has been executed, the 
current label of the computation would be Hospital. The medical record 
itself would be bound to m. In the second line, the program attempts to 
send 1 m to some insurance company, which has label Insurance, so the 
security check Hospital C Insurance is performed. Since the policy does 
not allow this flow, the program would be stopped at this point, and m 
would not reach insuranceCompany. 

When writing LIO pro- 
grams, one must be care- 
ful in the way that the 

program is structured and steallnfo medicalRecord = 
how the operations inter- domf- unlabel medicalRecord 

act with the current label. sendTo insuranceCompany m 

It is a common mistake 

to unlabel too many val- „ _ , T T ^ 

c , Fie. 2. Simple LIO code 

ues from several sources ° r 

in the same context, inad- 
vertently raising the cur- 
rent label so much that no useful outputs can be performed any more. 
This effect is known as label creep, and can be alleviated by a combination 
of mindful programming and a local scoping primitive called toLabeled. 
The changes to the current label in a toLabeled block are undone after 
the block is executed, restoring the current label to what it was before 
running the block. In this sense, the floating-label approach seems to be 
a double-edged sword: on the one hand, it is perhaps a good idea to 
force programmers to structure their programs as a collection of blocks 
of code with different labels; on the other hand, the current label im- 
poses a restrictive style that may constrain the programmer too much. 
Practical experience from the developers of GitStar [5], an information- 
flow-aware system built on top of LIO, seem to indicate that the model 
is appropriate for writing such systems and that the programmers did 
not feel overly constrained by it. 



4 Thesis overview 



The contents of this thesis have been published as individual papers 
in the proceedings of peer-reviewed conferences and symposia. Each of 

1 The operation sendTo does not exist as a primitive, but it is just meant as an 
example of an effectful operation that performs an output. 
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the four chapters that follow presents one of these papers. This section 
briefly outlines their contents and states the contributions of the author. 
Fig. 3 gives an overview of the four main chapters of the thesis. Chap- 
ters 2 and 5 deal with language-based covert channels, while Chapters 3 
and 4 present two different ways of addressing hardware-based timing 
perturbations such as those caused by the processor cache. 



Covert timing channels 



Language -based 
channels 



Hardware-based 
timing perturbations 



Chapter 2 




Chapter 5 


Concurrency 




Lazy eval. 



Chapter 3 
Modif. scheduler 





Chapter 4 




Library 



Fig. 3. Overview of the thesis 



Chapter 2: Addressing Covert Termination and Timing Channels in 
Concurrent Information Flow Systems 

As explained before, confidential information may be leaked through 
termination and timing channels. The termination covert channel has 
limited bandwidth for sequential programs [1], but it is a more danger- 
ous source of information leakage in concurrent settings. 

In this chapter, we present an information-flow control system that 
is secure against the termination and internal timing channels, i.e. situa- 
tions in which the outcome of a race among several threads depends on 
confidential information. Intuitively, we leverage concurrency by plac- 
ing such potentially sensitive actions in separate threads, each with its 
own floating label. Then, we require other threads to raise their current 
label accordingly before observing termination and timing of higher- 
confidentiality contexts. Additionally, we show how to mitigate exter- 
nal timing in this setting using ideas from Askarov et al [2]. The chap- 
ter introduces the concurrent version of LIO, which is, to the best of 
our knowledge, the first concurrent dynamic IFC system that deals with 
timing channels. 

Statement of contributions The paper was co-authored with Deian 
Stefan, Alejandro Russo, Amit Levy, John C. Mitchell, and David Maz- 
ieres. Pablo was mainly responsible for the contents of the Soundness 
section. 

This chapter was published as a paper in the proceedings of the 17th 
ACM SIGPLAN International Conference on Functional Programming 
(ICFP)2012. 
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Chapter 3: Eliminating cache-based timing attacks with instruction- 
based scheduling 

In this chapter, we show that concurrent deterministic IFC systems 
that use time-based scheduling are vulnerable to a cache-based internal 
timing channel. We demonstrate this vulnerability with a concrete attack 
on LIO, which can be used to attack GitStar. The secret is encoded in the 
hardware cache, a resource that is implicitly shared among all threads 
and not modelled in the previous paper. As a result, the cache is not 
subject to LIO's usual IFC mechanisms, and the attack succeeds. 

To eliminate this internal timing channel, we implement instruction- 
based scheduling, a new kind of scheduler that is indifferent to timing per- 
turbations from underlying hardware components, such as the cache, 
TLB, and CPU buses. We show this scheduler is secure against cache- 
based internal timing attacks for applications using a single CPU. 

Statement of contributions This paper was co-authored with Deian 
Stefan, Edward Z. Yang, Amit Levy, David Terei, Alejandro Russo, and 
David Mazieres. Pablo discovered the cache-based attack for LIO, con- 
tributed to the design of the cache-aware semantics and was responsible 
for the Semantics and Formal guarantees sections. 

This chapter was published as a paper in the proceedings of the 18th 
European Symposium on Research in Computer Security (ESORICS) 
2013. 

Chapter 4: A library for removing cache-based attacks in concurrent 
information flow systems 

In the previous chapter we present cache-based attacks in concurrent 
information flow systems, and provide a solution which involves modi- 
fying the scheduler in the Haskell runtime. In this chapter, we tackle the 
same problem from a purely language-based perspective, by providing 
a Haskell library that can be used as a replacement for concurrent LIO 
and which is resilient against cache-based attacks. We leverage resump- 
tions - a tame form of continuations - to attain fine-grained control over 
the interleaving of thread computations at the library level. Specifically, 
we remove cache-based attacks by ensuring that every thread yields af- 
ter executing an "instruction", i.e., an atomic action, in analogy with the 
behaviour of the instruction-based scheduler from the previous paper. 

Statement of contributions This paper was co-authored with Amit 
Levy, Deian Stefan, Alejandro Russo and David Mazieres. Pablo con- 
tributed to the design and implementation of the library, and was re- 
sponsible for the formalisation and proofs. 

This chapter was published as a paper in the proceedings of the 
8th International Symposium on Trustworthy Global Computing (TGC) 
2013. 
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Chapter 5: Lazy Programs Leak Secrets 

Haskell's evaluation mechanism is lazy, which means that arguments 
to functions are not evaluated until they are needed in the body of the 
function. Crucially when such an argument (also known as a thunk) is 
finally evaluated, its value gets cached and is reused in subsequent oc- 
curences of the same argument. In this chapter, we describe a novel ex- 
ploit of lazy evaluation to reveal secrets in IFC systems through inter- 
nal timing. We illustrate our claim with an attack on LIO. This attack is 
analogous to the cache-based attack, since thunks work like caches and 
can be shared by multiple threads by merely holding a pointer to them. 
We propose a countermeasure based on restricting the implicit sharing 
caused by lazy evaluation, but we do not implement these ideas, leaving 
them for future work. 

Statement of contributions This paper was co-authored with Ale- 
jandro Russo. Pablo discovered the attack and contributed to the design 
of the proposed solution. 

This chapter was published as a paper in the proceedings of the 8th 
Nordic Conference on Secure IT Systems (NordSec) 2013. 

References 

1. A. Askarov, S. Hunt, A. Sabelfeld, and D. Sands. Termination-insensitive 
noninterference leaks more than just a bit. In Proceedings of the 13th European 
Symposium on Research in Computer Security: Computer Security, ESORICS '08, 
pages 333-348, Berlin, Heidelberg, 2008. Springer- Verlag. 

2. A. Askarov, D. Zhang, and A. C. Myers. Predictive black-box mitigation of 
timing channels. In Proc. of the 17th ACM CCS. ACM, 2010. 

3. N. Broberg, B. van Delft, and D. Sands. Paragon for practical programming 
with information-flow control. In C.-c. Shan, editor, Programming Languages 
and Systems, volume 8301 of Lecture Notes in Computer Science, pages 217-232. 
Springer International Publishing, 2013. 

4. D. E. Denning. A lattice model of secure information flow. Communications 
of the ACM, 19(5):236-243, May 1976. 

5. D. B. Giffin, A. Levy, D. Stefan, D. Terei, D. Mazieres, J. C. Mitchell, and 
A. Russo. Hails: Protecting data privacy in untrusted web applications. In 
Proceedings of the 10th USENIX Conference on Operating Systems Design and 
Implementation, OSDI'12, pages 47-60, Berkeley, CA, USA, 2012. USENIX As- 
sociation. 

6. J. A. Goguen and J. Meseguer. Security policies and security models. In 
IEEE Symposium on Security and Privacy, pages 11-20. IEEE Computer Soci- 
ety, 1982. 

7. D. Hedin, A. Birgisson, L. Bello, and A. Sabelfeld. ISFlow: Tracking informa- 
tion flow in JavaScript and its APIs. Proc. 29th ACM Symposium on Applied 
Computing, 2014. 

8. B. W. Lampson. A note on the confinement problem. Commun. ACM, 
16(10):613-615, Oct. 1973. 



10 



9. E. Moggi. Notions of computation and monads. Information and Computation, 
93(1)55-92, 1991. 

10. A. C. Myers and B. Liskov. A decentralized model for information flow 
control. In Proc. of the 16th ACM Symp. on Operating Systems Principles, pages 
129-142,1997. 

11. A. C. Myers and B. Liskov. Protecting privacy using the decentralized label 
model. ACM Trans, on Computer Systems, 9(4):41CM42, October 2000. 

12. S. Peyton Jones et al. The Haskell 98 language and libraries: The re- 
vised report. Journal of Functional Programming, 13(l):0-255, Jan 2003. 
http: / / www.haskell.org/definition/ . 

13. A. Russo, K. Claessen, and J. Hughes. A library for light-weight information- 
flow security in Haskell, 2008. 

14. A. Sabelfeld and A. C. Myers. Language-based information-flow security. 
IEEE Journal on Selected Areas in Communications, 21(1), January 2003. 

15. J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in systems 
design. ACM Trans, on Computer Systems, 2(4):277-288, 1984. 

16. G. Smith and D. Volpano. Secure information flow in a multi-threaded im- 
perative language. In Proc. ACM Symp. on Principles of Prog. Languages, Jan. 
1998. 

17. D. Stefan, A. Russo, D. Mazieres, and J. C. Mitchell. Disjunction category 
labels. In Proc. of the NordSec 2011 Conference, October 2011. 

18. D. Volpano, C. Irvine, and G. Smith. A sound type system for secure flow 
analysis. /. Comput. Secur., 4(2-3):167-187, Jan. 1996. 

19. N. Zeldovich, S. Boyd-Wickizer, E. Kohler, and D. Mazieres. Making infor- 
mation flow explicit in HiStar. In Proc. of the 7th Symp. on Operating Systems 
Design and Implementation, pages 263-278, Seattle, WA, November 2006. 



CHAPTER 

TWO 



ADDRESSING COVERT TERMINATION AND 
TIMING CHANNELS IN CONCURRENT 
INFORMATION FLOW SYSTEMS 



Deian Stefan, Alejandro Russo, Pablo Buiras 
Amit Levy, John C. Mitchell, David Mazieres 



Abstract. When termination of a program is observable by an ad- 
versary, confidential information may be leaked by terminating ac- 
cordingly. While this termination covert channel has limited band- 
width for sequential programs, it is a more dangerous source of 
information leakage in concurrent settings. We address concurrent 
termination and timing channels by presenting an information- 
flow control system that mitigates and eliminates these channels 
while allowing termination and timing to depend on secret values. 
Intuitively, we leverage concurrency by placing such potentially 
sensitive actions in separate threads. While termination and tim- 
ing of these threads may expose secret values, our system requires 
any thread observing these properties to raise its information-flow 
label accordingly, preventing leaks to lower-labeled contexts. We 
develop our approach in a Haskell library and demonstrate its 
applicability by implementing a web server that uses information- 
flow control to restrict untrusted web applications. 
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1 Introduction 

Covert channels arise when programming language features are mis- 
used to leak information [28]. For example, when termination of a pro- 
gram is observable to an adversary, a program may intentionally or acci- 
dentally communicate a confidential bit by terminating according to the 
value of that bit. While this termination covert channel has limited band- 
width for sequential programs, it is a significant source of information 
leakage in concurrent settings. Similar issues arise with covert timing 
channels, which are potentially widespread because so many programs 
involve loops or recursive functions. These channels, based on either in- 
ternal observation by portions of the system or external observation, are 
also effective in concurrent settings. 

We present an information-flow system that mitigates and eliminates 
termination and timing channels in concurrent systems, while allow- 
ing timing and termination of loops and recursion to depend on secret 
values. Because the significance of these covert channels depends on 
concurrency, we fight fire with fire by leveraging concurrency to miti- 
gate these channels: we place potentially nonterminating actions, or ac- 
tions whose timing may depend on secret values, in separate threads. 
While termination and timing of these threads may expose secret val- 
ues, our system requires any thread observing these properties to raise 
its information-flow label accordingly. We develop our approach in a 
Haskell library that uses the Haskell type system to prevent code from 
circumventing dynamic information-flow tracking. We demonstrate the 
applicability of this approach by implementing a web server that applies 
information-flow control to untrusted web applications. Although we 
do not address underlying hardware issues such as cache timing, our 
language-level methods can be combined with hardware-level mech- 
anisms as needed to provide comprehensive defenses against covert 
channels. 

Termination covert channel Askarov et al. [2] show that for sequential 
programs with outputs, the termination covert channel can only be ex- 
ploited by exponentially complex brute-force: no attacker can reliably 
learn the secret in time polynomial in the size of the secret. Moreover, if 
secrets are uniformly distributed, the attacker's advantage (after observ- 
ing a polynomial amount of output) is negligible in comparison with the 
size of the secret. Because of this relatively low risk, accepted sequential 
information-flow tools such as Jif [34], and FlowCaml [42], are only de- 
signed to address termination-insensitive noninterference. In a concur- 
rent setting, however, the termination covert channel may be exploited 
more significantly [19]. We therefore focus on termination covert chan- 
nels in concurrent programs and present an extension to our Haskell 
LIO library [46], which provides dynamic tracking of labeled values. 
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By providing labeled f orkLiO and waitLio, our extension removes the 
termination covert channel from sequential and concurrent programs 
while allowing loops whose termination conditions depend on secret 
information. 

Internal timing channel Multi-threaded programs can leak information 
through an internal timing covert channel [50] when the observable timing 
behavior of a thread depends on secret data. This occurs when the time 
to produce a public event, such as placing public data on a public chan- 
nel, depends on secret data, or, more generally, when a race to acquire 
a shared resource may be affected by secrets. We close this covert chan- 
nel using the same approach as termination leaks: we decouple the exe- 
cution of public events from computations that manipulate secret data. 
Using labeled f orkLio and waitLio, computation depending on secret 
data proceeds in a new thread, and the number of instructions executed 
before producing public events does not depend on secrets. Therefore, a 
possible race to a shared public resource does not depend on the secret, 
eliminating internal timing leaks. 

External timing channel External timing covert channels, which involve 
externally measuring the time used to complete operations that may de- 
pend on secret information, have been used in practice to leak informa- 
tion [7, 17] and break cryptography [18, 26, 51]. While several mecha- 
nisms exist to mitigate external timing channels [1, 5, 20], external timing 
channels are not addressed by conventional information-flow tools and 
in fact most of the previous techniques for language-based information- 
flow control appear to have limited application. Our contribution to ex- 
ternal timing channels is to bring the mitigation techniques from the OS 
community into the language-based security setting. Generalizing pre- 
vious work [3], Zhang et al. [53] propose a black-box mitigation tech- 
nique that we adapt to a language-based security setting. In this ap- 
proach, the source of observable events is wrapped by a timing mitigator 
that delays output events so that they contain only a bounded amount 
of information. We take advantage of the way Haskell makes it possi- 
ble to identify when outputs are produced and implement the mitigator 
as part of the lio library. Leveraging Haskell monad transformers [31], 
we show how to modularly extend lio, or any other library perform- 
ing side-effects in Haskell, to provide a suitable form of Zhang et al.'s 
mitigator. 

In summary, the main contributions of this paper are: 
►■ We present an information flow control (IFC) system that eliminates 
the termination and internal timing covert channels, while mitigat- 
ing the external timing one. The system provides support for threads, 
light-weight synchronization primitives, and allows loops and bran- 
ches to depend on sensitive (high) values. We believe this is the first 
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implementation of a language-based IFC system for concurrency that 
does not rely on cooperative-scheduling. 

► We eliminate termination and internal-timing covert channels us- 
ing concurrency with potentially sensitive actions run in separate 
threads. This is implemented in a Haskell library that uses labeled 
concurrency primitives 1 . 

► We provide language-based support for resource-usage mitigation 
using monad transformers. We use this method to implement the 
black-box external timing mitigation approach of Zhang et al.; the 
method is also applicable to other covert channels, such as storage. 

► We demonstrate the language implementation by building a sim- 
ple server-side web application framework. In this framework, un- 
trusted applications have access to a persistent key-value store. More- 
over, requests to apps may be from malicious clients colluding with 
the application in order to learn sensitive information. We show sev- 
eral potential leaks through timing and termination and show how 
our library is used to address them. 

Section 2.3 provides background on information flow, Haskell, and 
the Haskell LIO monad. We discuss the termination covert channel and 
its elimination in Section 3, the internal timing covert channel and its 
elimination in Section 4, and the external timing channel and its mitiga- 
tion in Section 5. Formalization of the library is given in Section 6 and 
the security guarantees in Section 7. The implementation and experi- 
mental evaluation are presented in Section 8. Related work is described 
in Section 9. We conclude in Section 10. 

2 Background 

We build on a dynamic information flow control library in Haskell called 
LIO [46] . This section describes LIO and some of its relevant background. 
We first give an overview of IFC in abstract terms. We then give a brief 
overview of Haskell. Finally, we describe how LIO is implemented tak- 
ing advantage of Haskell's static typing and pure functional nature. 

2.1 Information flow control 

IFC's goal is to track and control the propagation of information. In an 
IFC system, every observable bit has an associated label. Moreover, labels 
form a lattice [12] governed by a partial order g pronounced "can flow 
to." The value of a bit labeled L out can depend on a bit labeled L in only 

if iin - ^out- 

1 The library implementations discussed in this paper can be found at http : 
/ / www . scs . Stanford. edu/~deian/concurrent_lio 
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In a floating-label system, every execution context has a label that can 
rise to accommodate reading more sensitive data. For a process P la- 
beled Lp to observe an object labeled Lo, P's label must rise to the least 
upper bound or join of the two labels, written Lp u Lo- P's label effec- 
tively "floats above" the labels of all objects it observes. Furthermore, 
systems frequently associate a clearance with each execution context that 
bounds its label. 

Specific label formats depend on the application and are not the fo- 
cus of this work. Instead, we will focus on a very simple two-point lat- 
tice with labels Low and High, where Low E High and High ^ Low. 
We, however, note that our implementation is polymorphic in the label 
type and any label format that implements a few basic relations (e.g., 
!=, join U, and meet n) can be used when building applications. The LIO 
library supports privileges which are used to implement decentralized 
information flow control as originally presented in [33]; though we do 
not discuss privileges in this paper, our implementation also provides 
privileged-versions of the combinators described in later sections. 

2.2 Haskell 

We choose the Haskell programming language because its abstractions 
allow IFC to be implemented in a library [29]. Building a library is far 
simpler than developing a programming language from scratch (or heav- 
ily modifying a compiler). Moreover, a library offers backwards compat- 
ibility with a large body of existing Haskell code. 

From a security point of view, Haskell's most distinctive feature is 
a clear separation of pure computations from those with side-effects. 
Any computation with side-effects must have a type encapsulated by 
the monad io. The main idea behind the LIO library is that untrusted 
actions must be specified with a new lio monad instead of io. Because 
the types are different, untrusted code cannot bind io actions to lio 
ones. The only io actions that can be executed within lio actions are the 
ones that have been wrapped in the lio type using a private constructor 
only visible to trusted code. All such wrapped io actions perform label 
checks to enforce IFC. 

2.3 The LIO monad 

In this section, we give an overview of LIO. LIO dynamically enforces 
IFC, but without the features described in this paper, provides only ter- 
mination-insensitive IFC [2] for sequential programs. At a high level, LIO 
provides a monad called lio (Labeled I/O) intended to be used in place 
of 10. The library furthermore contains a collection of lio actions, many 
of them similar to 10 actions from standard Haskell libraries, except that 
the lio versions contain label checks that enforce IFC. For instance, LIO 
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provides file operations that look like those of the standard library, ex- 
cept that they confine the application to a dedicated portion of the file 
system where they store a label along with each file. 

LIO is a floating-label system. The lio monad keeps a current label, 
L CUI , that is effectively a ceiling over the labels of all data that the current 
computation may depend on. lio also maintains a current clearance, C cur , 
which specifies an upper bound on permissible values of L cnT . 

LIO does not individually label definitions and bindings. Rather, all 
symbols in scope are identically labeled with L cu[ . The only way to ob- 
serve or modify differently labeled data is to execute actions that inter- 
nally access privileged symbols. Such actions are responsible for appro- 
priately validating and adjusting the current label. 

As an example, the LIO file-reading function readFile, when exe- 
cuted on a file labeled L p, first checks that Lp != C cur , throwing an excep- 
tion if not. If the check succeeds, the function raises L cur to L cw uLp be- 
fore returning the file content. The LIO file-writing function, writeFile, 
throws an exception if L cur ^ Lp. 

As previously mentioned, allowing experimentation with different 
label formats, LIO actions are parameterized by the label type. For in- 
stance, simplifying slightly: 

readFile : : (Label 1) => FilePath -> LIO 1 String 

To be more precise, it is really (Lio 1) that is a replacement for the 
io monad, where l can be any label type. The context (Label l)=> 
in readFile's type signature restricts 1 to types that are instances of 
the Label typeclass, which abstracts the label specifics behind the basic 
methods £, u, and n. 

2.4 Labeled values 

Since LIO protects all nameable values with L CUI , we need a way to ma- 
nipulate differently-labeled data without monotonically increasing L cur . 
For this purpose, LIO provides explicit references to labeled, immutable 
data through a polymorphic data type called Labeled. A locally acces- 
sible symbol (at L CUI ) can name, say, a Labeled 1 int (for some label 
type l), which contains an int protected by a different label. 

Several functions allow creating and using Labeled values: 
►■ label :: (Label 1)=> 1 -> a -> LIO 1 (Labeled 1 a) 

Given label I ; L CUI != I != C cur and value v, action label I v returns a 

Labeled value guarding v with label I. 
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Listing 1 Exploiting the termination channel by brute-force 

bruteForce: : String -> Int -> Labeled 1 Int -> LIO 1 () 
bruteForce msg n secret = forM_ [0. .n] $ \i -> do 
toLabeled High $ do 
s <- unlabel secret 

if s == i then undefined else return () 
outputLow (msg ++ show i) 



► unlabel :: (Label 1)=> Labeled 1 a -> LIO 1 a 

If lv is a Labeled value V with label I, unlabel lv raises L cur to L cur u 
/ (provided L cur E C cur still holds, otherwise it throws an exception) 
and returns v. 

► toLabeled : : (Label 1) => 

1 -> LIO 1 a -> LIO 1 (Labeled 1 a) 
The dual of unlabel: given an action m that would raise L cur to L' car 
where L' CUI != I != C cur , toLabeled I m executes m without raising L CUI , 
and instead encapsulates m's result in a Labeled value protected by 
label I. 

► labelOf :: (Label 1)=> Labeled 1 a -> 1 
Returns the label of a Labeled value. 

As an example, we show an lio action that adds two Labeled ints: 
addLIO 1A IB = do a <- unlabel 1A 
b <- unlabel IB 
return (a + b) 

If the inputs' labels are La and L b, this action raises L CUI to u Lb uL cm 
and returns the sum of the values. 

We note that in an imperative language with labeled variables, dy- 
namic labels can lead to implicit flows [13]. The canonical example is as 
follows: 

public := 0; // public has a Low label 
if (secret) // secret has a High label 
public := 1; // public depends on secret 

To avoid directly leaking the secret bit into public, one should 
track the label of the program counter and determine that execution of 
the assignment public := 1 depends on secret, and raise public's 
label when assigning public := 1. However, since the assignment ex- 
ecutes conditionally depending on secret, now public's label leaks the 
secret bit. LIO does not suffer from implicit flows. When branching on 
a secret, L cur becomes High and therefore no public events are possible. 

3 The termination covert channel 

As mentioned in the introduction, information-flow control results and 
techniques for sequential settings do not naturally generalize to concur- 
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rent settings. In this section we highlight that the sequential LIO library 
allows leaks due to termination and show that a naive (but typical) ex- 
tension that adds concurrency drastically amplifies this leak. We present 
a modification to the LIO library that eliminates the termination covert 
channel from both sequential and concurrent programs; our solution al- 
lows for flexible programming patterns, even writing loops whose ter- 
mination condition depends on secret data. 

Listing 1 shows an implementation of an attack (previously described 
by Askarov et al. in [2]) that leaks a secret in a brute-force way through 
the termination covert channel. Function bruteForce takes three argu- 
ments: a string message, the public maximum value that a non-negative 
secret int can have, and a secret labeled int. Given these arguments 
the function returns an LIO action that when executed returns unit ( ) , 
but producing intermediate side-effects. Namely, bruteForce writes to 
a Low labeled channel using outputLow while L CUI is Low. We assume 
that bruteForce is executed with the initial L cm as Low. 

The attack consists of iterating (variable i) over the domain of the 
secret (forM_ [0. . n]), producing a publicly-observable output if the 
guess, i, is not the value of the secret. When i is equal to the secret, 
the program diverges (if s == i then undefined). We use the con- 
stant undefined to denote any non-terminating computation. Observe 
that on every iteration L CUI is raised to the label of the secret within the 
toLabeled block. However, as described in Section 2.3, the current la- 
bel outside the toLabeled block remains unaffected, and so the com- 
putation can continue producing publicly-observable outputs. The leak 
due to termination is obvious: when the attacker, observing the Low 
labeled output channel, no longer receives any data, the value of the 
secret can be inferred given the previous outputs. For instance, to leak 
a 16-bit bounded secret, we can execute bruteForce "it is not: " 
65536 secret. Assuming the value of the secret is 4, executing the ac- 
tion produces the outputs "It is not: 0", "It is not: 1", "It is not: 2", "It is 
not: 3" before diverging. An observer that knows the implementation of 
bruteForce can directly infer that the value of the secret is 4. Observe 
that the code producing public outputs (outputLow (msg ++ show i)) 
does not inspect secret data at all, which makes it difficult to avoid ter- 
mination leaks by simply tracking the flow of labeled data inside pro- 
grams. 

Suppose that we (naively) add support for concurrency to LIO us- 
ing a hypothetical primitive fork, which simply spawns computations 
in new threads. Although we can preserve termination-insensitive non- 
interference (i.e., retain the property of no-explicit nor implicit-flows), 
we can extend the previous brute force attack to leak information in 
linear, as opposed to log, time in the length of the secret. In general, 
adding concurrency primitives in a straight-forward manner makes at- 
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Listing 2 A concurrent termination channel attack 

conourrentAttack : : Int -> Labeled 1 Int -> LIO 1 () 
conourrentAttack k secret = forM_ [0. .k] $ \i -> do 
iBit <- toLabeled High $ do 

s <- unlabel secret 
return (extractBit is) ) 
fork $ bruteForce (show k ++ "-bit:") 1 iBit 
where extractBit : : Int -> Int -> Int 

extractBit i n = (shiftR n i) .&. (bit 0) 



tacks that leverage the termination covert channel very effective [19]. To 
illustrate this point, the attack of Listing 2 leaks the bit-contents of a se- 
cret value in linear time as follows. Given the bit-length k of a secret and 
the labeled secret, conourrentAttack returns an action which, when 
executed, extracts the bits of the secret (extractBit i s) and spawns a 
corresponding thread to recover them by executing the brute-force at- 
tack of Listing 1 (bruteForce (show k ++ "-bit:") 1 iBit). Hence, 
by collecting the public outputs generated by the different threads (hav- 
ing the form "0-bit:0", "l-bit:l", "2-bit:l", etc.), it is directly possible to 
recover the secret value. 

3.1 Removing the termination covert channel in LIO 

Since LIO is a floating-label system and at each point in the computation 
the evaluation context has a current label, a leak to a Low channel due 
to termination cannot occur after the current label is raised to High, un- 
less the label-raise is within an enclosed toLabeled computation. Unless 
enclosed within a toLabeled, having i cur =High implies that publicly- 
observable side-effects are no longer allowed. Hence, we can deduce 
that a piece of LIO code can exploit the termination covert channel only 
when using toLabeled. The key insight is that toLabeled is the single 
primitive in LIO that effectively allows a piece of code to temporarily 
raise its current label, perform a computation, and then continue with 
the starting current label. The attack in Listing 1 is a clear example that 
leverages this property of toLabeled to leak information. 

Consider the necessary conditions for eliminating the termination 
channel present in Listing 1: the execution of the publicly-observable 
outputLow action must not depend on, or wait for, the secret computa- 
tion executed within the toLabeled block. More generally, to close the 
termination covert channel, it is necessary to decouple the execution of 
computations enclosed by toLabeled. To achieve such decoupling, in- 
stead of using toLabeled, we provide an alternative primitive that exe- 
cutes computations that might raise the current label (as in toLabeled) 
in a newly-spawned thread. Moreover, to observe the result (or non- 
termination) of a spawned computation, the current label is firstly raised 
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to the label of the (possibly) returned result. In doing so, after observing 
a secret result (or non-termination) of a spawned computation, actions 
that produce publicly-observable side-effects can no longer be executed. 
In this manner, the termination channel is closed. 

In Listing 1, the execution of outputLow is bound to the termination 
of the computation described by toLabeled. However, using our pro- 
posed approach of spawning a new thread when performing toLabeled, 
if the code following the toLabeled wishes to observe whether or not 
the High computation has terminated, it would first need to raise the 
current label to High. Thereafter, an outputLow action cannot be exe- 
cuted regardless of the result (or termination) of the toLabeled compu- 
tation. 

Concretely, we close the termination channel by removing the inse- 
cure function toLabeled from lio and, instead, provide the following 
(termination sensitive) primitives. 

forkLIO : : Label 1 => 1 -> LIO 1 a -> LIO 1 (Result 1 a) 
waitLIO : : Label 1 => Result 1 a -> LIO 1 a 

Intuitively, forkLIO can be considered as a concurrent version of 
toLabeled. forkLIO 1 lio spawns a new thread to perform the com- 
putation lio, whose current label may rise, and whose result is a value 
labeled with 1. Rather than block, immediately after spawning a new 
thread, the primitive returns a value of type Result 1 a, which is sim- 
ple a handler to access the labeled result produced by the spawned com- 
putation. Similar to unlabel, we provide waitLIO, which inspects val- 
ues returned by spawned computations, i.e., values of type Result 1 a. 
The labeled wait, waitLIO, raises the current label to the label of its ar- 
gument and then proceeds to inspect it. 

In principle, rather than forking threads, it would be enough to prove 
that computations involving secrets terminate, e.g., by writing them in 
Coq or Agda. However, while this idea works in theory, it is still possible 
to crash an Agda or Coq program at runtime: for example, with a stack 
overflow. Generally, abnormal termination due to resource exhaustion 
exploits the termination channel and it could be hard to counter. In this 
light, forking threads is a manner to remove the termination channel by 
design. Although it might seem expensive, forking threads in Haskell is 
a light-weight operation 2 . 

We note that adding concurrency to L I o is a major modification which 
introduces security implications beyond that of handling the termina- 
tion channel. In the following section, we describe the internal timing 
covert channel, a channel present in programming languages that have 
support for concurrency and shared-resources. 



2 

http : / /www. haskell . org/ ghc/docs /latest /html /libraries /base /Control- Concurrent .html 
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Listing 3 Internal timing leak 

sthread : : String -> Int -> Labeled 1 Bool -> LIO 1 ( ) 
sthread msg n secret = do toLabeled High 

( do s <- unlabel secret 
if s then sleep n 
else return () ) 
outputLow msg 

pthread : : String -> Int -> LIO 1 ( ) 
pthread msg n = do sleep n 

outputLow msg 

attack : : Labeled 1 Bool -> LIO 1 () 
attack secret = do fork (sthread "True" 5000 secret) 
fork (pthread "False" 1000) 



4 The Internal timing covert channel 

In a concurrent setting, the possibility that threads have to share re- 
sources opens up new information channels. Specifically, multi-threaded 
programs can leak information through the internal timing covert channel 
[50]. The source of the leaks comes from the ability of threads to affect 
their timing behavior based on secret data and thus affect, via the sched- 
uler, the order of public events. 

To illustrate internal timing attacks, we consider the LIO library 
from Section 2.3 with the added hypothetical primitive fork used to 
spawn a new thread. Listing 3 illustrates an internal timing attack. It 
consists of two threads: sthread and pthread. Command sleep n puts 
a thread to sleep for n milliseconds. Thread sthread takes a string to 
output in a public channel (outputLow msg) and the number of millisec- 
onds to sleep (sleep n) if the secret boolean taken as argument (secret) 
is true. Thread pthread, on the other hand, does not take any secret but 
it writes a message (msg) to the same output channel as sthread after 
sleeping some milliseconds. Observe that both threads share a resource 
(i.e., the output channel) and that the timing behavior of sthread de- 
pends on the secret boolean. 

With this example, sthread should take more time to execute the 
outputLow action than pthread if and only if the secret boolean is true. 
In isolation, both threads are secure, i.e., they satisfy non-interference. 
In fact, when considering them in isolation, both threads always pro- 
duce the public output given by the argument msg. However, by run- 
ning them concurrently, it is possible to leak information about secret. 
Function attack spawns two threads that execute sthread and pthread 
concurrently. Under many reasonable schedulers, if secret is true, it is 
more likely that the instruction outputLow "False " is executed first. On 
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the other hand, if secret is false, it is more likely that outputLow "True " 
is executed first. An attacker can then observe the value of secret by 
just observing the second produced output. 

Unlike other timing channel attacks, internal timing attacks do not 
require an attacker to measure the actual execution time to deduce se- 
cret information. The interleaving of threads is simply responsible for 
producing leaks! Although the example in Listing 3 shows how to leak 
one bit, it is easy to place the attack in a loop that leaks bit by bit a whole 
secret value in linear time. Tsai et al. [48] show how effective the attack is 
even without having much information about the run-time system (e.g., 
the scheduler). The authors implemented the magnified version of the 
attack in Listing 3 and showed how to leak a credit card number. 

4.1 Removing the internal timing channel 

As indicated by the code in Listing 3, the internal timing covert channel 
can be exploited when the time to produce public events (e.g., sending 
some data in a public channel) depends on secrets. In other words, inter- 
nal timing arises when there is a race to acquire a shared resource that 
may be affected by secret data. In order to close this channel, we apply 
the same technique as for dealing with termination leaks: we decouple 
the execution of public events from computations that manipulate se- 
cret data. By using forkLiO and waitLiO, computations dealing with 
secrets are spawned in a new thread. In that manner, any possible race 
to a shared public resource does not depend on the secret anymore and 
thus internal timing leaks are no longer possible. 

4.2 Synchronization primitives in concurrent LIO 

In the presence of concurrency, synchronization is vital. This section in- 
troduces an IFC-aware version of MVars, which are well-established syn- 
chronization Haskell primitives [24]. As with MVars, LMVars can be used 
in different manners: as synchronized mutable variables, as channels of 
depth one, or as building blocks for more complex communication and 
synchronization primitives. 

A value of type LMVar 1 a is mutable location that is either empty 
or contains a value of type a labeled with 1. LMVars are associated with 
the following operations: 

newEmptyLMVar : : (Label 1) => 1 -> LIO 1 (LMVar 1 a) 
putLMVar : : (Label 1) => LMVar 1 a -> a -> LIO 1 () 

takeLMVar : : (Label 1) => LMVar 1 a -> LIO 1 a 
Function newEmptyLMVar takes a label l and creates an empty LMVar l 
a for any desired type a. The creation succeeds only if the label 1 is be- 
tween the current label and clearance of the lio computation that cre- 
ates it. Function putLMVar fills an LMVar l a with a value of type a if it is 
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empty and blocks otherwise. Dually, takeLMVar empties an LMVar l a 
if it is full and blocks otherwise. 

Note that both takeLMVar and putLMVar check if the LMVar is empty 
in order to proceed to modify its content. Precisely takeLMVar and 
putLMVar perform a read and a write of the mutable location. Conse- 
quently from a security point of view, operations on a given LMVar 1 a 
are executed only when the label l is below or equal to the clearance 
(i.e., I E C cur due to the read) and above or equal to the current label (i.e., 
L CUI £ I due to the write). Moreover, after either operation, L cur is raised 
to I. 

Many communication channels used in practice are often bi-directional, 
i.e., a read produces a write (and vice versa). For instance, reading a file 
may modify the access time in the inode; writing to a socket may pro- 
duce an observable error if the connection is closed, etc. As described 
above, LMVar are bi-directional channels. If we were to treat them as uni- 
directional, observe that, a termination leak would be possible: a thread, 
whose current label is Low can use a LMVar labeled Low to send informa- 
tion to a computation whose current label is High; the High thread can 
then decide to empty the LMVar according to a secret value and thus leak 
information to the Low thread. 

5 The external timing covert channel 

In a real-world scenario IFC applications interact with unlabeled, or 
publicly observable, resources. For example, a server-side IFC web ap- 
plication interacts with a browser, which may itself be IFC-unaware, 
over a public network channel. Consequently, an adversary can take 
measurements external to the application (e.g., the application response 
time) from which they may infer information about confidential data 
computed by the application. Although our results generalize (e.g., to 
the storage covert channel), in this section we address the external timing 
covert channel: an application can leak information over a public channel 
to an observer that precisely measures message-arrival timings. Note 
that the content of a message does not need to be public (hence why the 
channel is considered covert); this is the case in a web application where 
a message may be encrypted with SSL, but the actual placement of a 
message on the channel is observable by a network attacker. 

Most of the language-based IFC techniques that consider external 
timing channels are limited. Despite the successful use of external tim- 
ing attacks to leak information in web [7, 17] and cryptographic [18, 26, 
51] applications, they remain widely unaddressed by mainstream, prac- 
tical IFC tools, including Jif [34]. Furthermore, most techniques that pro- 
vide IFC in the presence of the external timing channel [1, 5, 20] are 
overly restrictive, e.g., they do not allow folding over secret data. 



Timing Channels in Concurrent IFC Systems 



25 



5.1 Mitigating the external timing channel 

Recently, a predictive black-box mitigation technique for external timing 
channels has been proposed [3, 53]. The predictive mitigation technique 
assumes that the attacker has control of the application (which computes 
on secret data) and can measure the time a message is placed on a chan- 
nel (e.g., when a response is sent to the browser). Treating the applica- 
tion as a black-box source of events, a mitigator is interposed between 
the application and the system output. 

Internally, the mitigator keeps a schedule describing when outputs are 
to be produced. For example, the time mitigator might keep a schedule 
"predicting" that an output is to be produced every 1ms. If the appli- 
cation delivers events according to the schedule, or at a higher rate, the 
mitigator will be able to produce an output at every 1ms interval, ac- 
cording to the schedule, and thus leak no information. 

Of course, the application may fail to deliver an event to the mitiga- 
tor on time, and thus render the mitigator 's schedule prediction false. 
At this point, the mitigator must handle the misprediction by selecting, 
or "predicting", a new schedule for the application. In most cases, this 
corresponds to doubling the application's quantum. For instance, fol- 
lowing a misprediction of a quantum of 1 ms, an application will be 
then expected to produce an output every 2 ms. It is at the point of 
switching schedules where an attacker learns information: rather than 
seeing events spaced at 1 ms intervals, the attacker now observes out- 
puts at 2 ms intervals, indicating that the application violated the pre- 
dicted behavior (a decision that can be affected by secret data). However, 
Askarov et al. [3] show that the amount of information leaked by this 
slow-doubling mitigator is polylogarithmic in the application runtime. 

Furthermore, the aspects of the predictive mitigation technique of [3, 
53] that makes it particularly attractive for use in LIO are: 

► The mitigator can adaptively reduce the quantum, as to increase the 
throughput of a well-behaved application in a manner that bounds 
the covert channel bandwidth (though the leakage factor is still greater 
than that of the slow-doubling mitigator); 

► The mitigator can leverage public factors to decide a schedule. For 
example, in a web application setting where responses are mitigated, 
the arrival of an HTTP request can be used as a "reset" event. This is 
particularly useful as a quiescent application would otherwise be pe- 
nalized (by increasing its quantum) for not producing an output ac- 
cording to the predicted schedule. Our web application of Section 8 
implements this mitigation technique 

► The amount of information leaked is bound by a combinatorial anal- 
ysis on the number of observations an attacker can perform. 
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Monadic approach to black-box mitigation Pure functional programming 
languages, such as Haskell, are particularly suitable for mitigating exter- 
nal timing covert channels. Specifically, the use of monads for enforcing 
an evaluation-order and introducing side-effects allows for the reason- 
ing and control of output events. Among many others, LIO is an ex- 
ample library that leverages this property of monads; LIO is simply a 
monad that performs side-effects according to IFC. 

The functionality of different monads, such as I/O and error han- 
dling, can be combined in a modular fashion using monad transform- 
ers [31]. A monad transformer t, when applied to a monad m, gener- 
ates a new, combined monad t m, that shares the behavior of monad 
m as well as the behavior of the monad encoded in the monad trans- 
former. The modularity of monad transformers comes from the fact that 
they consider the underlying monad m opaque, i.e., the behavior of the 
monad transformer t does not depend on the internal structure of m. In 
this light, we adopt Zhang et al.'s system-oriented predictive black-box 
mitigator to a language-based security setting in the form of a monad 
transformer. 

5.2 Language-based mitigators 

We envision the implementation of mitigators that address covert chan- 
nels other than external timing. For example, our ongoing work includes 
the implementation of a storage mitigator that addresses attacks which 
vary message (packet) length to encode secret information. Hence, our 
mitigation monad transformer MitM s q is polymorphic in the mitigator- 
specific state s and quantum type q : 

newtype MitM s q m a = MitM . . . 

The time-mitigated monad transformer is a special case: 

type TimeMitM = MitM TStamp TStanpDif f 

where the internal state TStamp is a time stamp, and the quantum 
TStampDif f is a time difference. Superficially, a value of type TimeMitM 
m a is a computation that produces a value of type a. Internally, a time 
measurement is taken whenever an output is to be emitted in the un- 
derlying monad m, the internal state and quantum are adjusted to reflect 
the event, and the output is delayed if it was produced ahead of the 
predicted schedule. 

We provide the function evalMitM, which takes an action of type 
MitM s q m a and returns an action of type m a, which when executed 
will mitigate the computation outputs. Observe that the monad trans- 
former leaves the possibility to use (almost) any underlying monad m, 
not just lio or io; this makes the monad transformer approach to miti- 
gation quite general. 
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Unfortunately, this generality comes with a trade-off: either every 
computation m is mitigated, or trustworthy programmers must define 
what objects they wish to mitigate and how to mitigate them. Given that 
the former design choice would not allow for distinguishing between 
inputs and outputs, we implemented the latter and more explicit miti- 
gation approach. 

To define what is to be mitigated (e.g., a file handle, a socket, a refer- 
ence, etc.), we provide the data type: 

data Mitigated s q a = Mitigated . . . 

For example, a time-mitigated I/O file handle is simply: 

type TimeMitigated = Mitigated TStamp TStampDif f 
type Handle = TimeMitigated 10. Handle 

The use of Mitigated allows us to do mitigation at very fine grain level. 
Specifically, the monad transformer can be used to implement a miti- 
gator for each Mitigated value (henceforth "handle"). This allows an 
application to write to multiple files, all of which are mitigated indepen- 
dently, and thus may be written to, at different rates 3 . It remains for us 
to address: how are the mitigators defined? 

Mitigators are defined as instances of the type class Mitigator, which 
provides two functions: 

class MonadConcur m => Mitigator m s q where 
— | Create a Mitigated "handle". 
mkMitigated : : Maybe s — A Internal state 



Firstly, we note the context MonadConcur m is used to impose the re- 
quirement that the underlying monad be an I o-like monad which allows 
forking new threads (as to separate the mitigator from the computation 
being mitigated) and operations on mutable MVars (which are internal 
to the MitM transformer). Secondly, we highlight the mkMitigated func- 
tion, which is used to create a mitigated handle given an initial state, 
quantum, and underlying constructor. The default implementation of 
mkMitigated creates the mitigator state (internal to the transformer) cor- 
responding to the handle. A simplified version of our openFile opera- 
tion shows how mkMitigated is used: 



-> q — * Quantum 

> m a — A Handle constructor 

■> MitM s q m (Mitigated s q a) 



— | Mitigate an operation 
mitigate : : Mitigated s q a 
-> (a -> m () ) 
-> MitM s q m () 



1 Mitigated "handle 
Output computation 



In cases where schedule mispredictions are common, it is important to implement the i-pracc period policy of [53]. The policy states that 
when there are more than I mispredictions, the new scheduling should affect all mitigators. 
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openFile : : FilePath -> IOMode -> TimeMitigated 10 Handle 
openFile f mode = mkMitigated Nothing q $ do 

h <- 10. openFile f mode — Handle constructor 

return h 

where q = mkQuant 1000 — Initial quantum of 1ms 

Here, the constructor io. openFile creates a file handle to the file at 
path f . This constructor is supplied to mkMitigated, in addition to the 
"empty" state Nothing, and initial quantum q of 1 ms, which creates the 
corresponding mitigator and Mitigated handle (recall Handle is a type 
alias to TimeMitigated io . Handle). We note that although the default 
definition of mkMitigated creates a mitigator per handle, instances may 
provide a definition that is more coarse-grained (e.g., associate mitigator 
with current thread). 

Finally, each mitigator provides a definition for mitigate, which 
specifies how a computation should be mitigated. The function takes 
two arguments: the mitigated handle and a computation that produces 
an output on the handle. Our time mitigator instance 

instance . . . => Mitigator m TStamp TStampDiff where 
mitigate mh act = . . . 

provides a definition for mitigate. The action first retrieves the inter- 
nal state of the mitigator corresponding to the mitigated handle mh and 
forks a new thread (allowing other mitigated actions to be executed). In 
the new thread, a time measurement t\ is taken. Then, if the time differ- 
ence between t\ and the mitigator time stamp to exceeds the quantum 
q, the new mitigator quantum is set to 2q; otherwise, the computation is 
"suspended" for ti + 1 0 microseconds. Following, act is executed, and 
the internal timestamp is replaced with the current time. Using MVars, 
we force operations on the same handle to be sequential and thus follow 
the latest schedule. 

Continuing the example, we can now define a function we wish to 
be mitigated: 

hPut : : Handle -> ByteString -> TimeMitigated 10 () 
hPut mH bs = mitigate mH (\h -> IO.hPut h bs) 

If hPut is invoked according to schedule (at least every 1 ms), the ac- 
tual output function io . hPut is used to write the provided byte-strings 
every 1 ms. Conversely, if the function does not follow the predicted 
schedule, the quantum will be increased and write-throughput to the 
file will decrease. Of course, this does not affect the schedule on a differ- 
ent handle (until a large number of mispredictions occur). 

Adapting an existing program to have mitigated outputs comes al- 
most for free: a trustworthy programmer needs to define the construc- 
tor functions, such as openFile, and output functions, such as hPut, 
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Listing 4 Syntax for values, expressions, and types. 

Label: I 
LMVar: m 

Value: v ::= true | false | () | I \ m \ x \ Xx.e | fixe 
| Lb/e | (e) 110 | 0 | Rm | . 
Expression: e ::= v \ e e | if e then e else e | let x = e in e 
| return e | e »= e | label e e 
| unlabel e | lowerClr e | getLabel 
| getClearance | labelOf e | out e e 
| forkLIOee | waitLIOe | newLMVar e e 
| takeLMVare | putLMVar e e | labelOf LMVar e 
Type: r ::= Bool | () | r -»• r | t \ Labeled It \ Result i t 
| LMVar £ t | LIO i t 



and simply lift all the remaining operations. Recall that MitM is a monad 
transformer, and thus we provide a definition for the function: 

lift : : : Monad m => m a -> MitM s q m a 

which lifts a computation in the m monad into the mitigation monad, 
without performing any actual mitigation. A simple example illustrat- 
ing this is the definition of hGet which reads a specified number of bytes 
from a handle: 

hGet : : Handle -> Int -> TimeMitigated 10 ByteString 
hGet mH = lift . 10. hGet . mitVal 

where mitVal returns the handle encapsulated by Mit igated. It is worth 
noting that, although we focus on mitigating writing operations, in some 
systems a file read will be reflected in the file's inode atime, and thus 
should be also accordingly mitigated. 

6 Formal semantics for LIO 

In this section, we formalise our library for a simply typed Curry-style 
call-by-name A-calculus with some extensions. Listing 4 defines the for- 
mal syntax for the language. Syntactic categories v, e, and r represent 
values, expressions, and types, respectively. Values are side-effect free 
while expressions denote (possible) side-effecting computations. Due to 
lack of space, we only show the reduction and typing rules for the core 
part of the library. For more details, readers can refer to Appendix 1 
available in the supplementary material. 
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Values The syntax category v includes the symbol true and false rep- 
resenting Boolean values. Symbol ( ) represents the unit value. Symbol I 
denotes security labels. Symbol m represents MVars. Values include vari- 
ables (x), functions (Xx.e), and recursive functions (fix e). Special syn- 
tax nodes are added to this category: Lb v e, (e) LI °, R m, and •. Node 
Lb v e denotes the run-time representation of a labeled value. Similarly 
node (e) 110 denotes the run-time representation of a monadic LIO com- 
putation. Node □ denotes the run-time representation of an empty MVar. 
Node R m is the run-time representation of a handle, implemented as an 
MVar, that is used to access the result produced by spawned computa- 
tions. Alternatively, R m can be thought of as an explicit future. Node • 
represents an erased term (explained in Section 7). None of these special 
nodes appear in programs written by users and they are merely intro- 
duced for technical reasons. 

Expressions Expressions are composed of values (v), function applica- 
tions (e e), conditional branches (if e then e else e), and local defi- 
nitions (let x = e in e). Additionally, expressions may involve oper- 
ations related to monadic computations in the LIO monad. More pre- 
cisely, return e and e »= e represent the monadic return and bind 
operations. Monadic operations related to the manipulation of labeled 
values inside the LIO monad are given by label, and unlabel. Ex- 
pression unlabel e acquires the content of the labeled value e while 
in a LIO computation. Expression label e\ e 2 creates a labeled value, 
with label e\, of the result obtained by evaluating the LIO computation 
e 2 . Expression lowerClr e allows lowering of the current clearance to 
e. Expressions getLabel and getClearance return the current label 
and current clearance of an LIO computation. Expression labelOf e ob- 
tains the security label of labeled values. Expression out ei e 2 denotes 
the output of e 2 to the output channel at security level e\. For simplicity, 
we assume that there is only one output channel per security level. Ex- 
pression f orkLIO ei e 2 spawns a thread that computes e 2 and returns 
a labeled value with label e\. Expression waitLIO e inspects the value 
returned by the spawned computation whose result is accessed by the 
handle e. Non-proper morphisms related to creating, reading, and writ- 
ing labeled MVars are respectively captured by expressions newLMVar, 
takeLMVar, and putLMVar. 

Types We consider standard types for Booleans (Bool), unit ( ( ) ), and 
function (r -»■ r) values. Type I describes security labels. Type Result £ r 
denotes handles used to access labeled results produced by spawned 
computations, where the results are of type r and labeled with labels 
of type £. Type LMVar I r describes labeled MVars, with labels of type 
i and storing values of type r. Type LIO i t represents monadic LIO 
computations, with a result type r and the security labels of type L 
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Listing 5 Typing rules for special syntax nodes. 



fhe:T 



fh • : t 



r H m : LMVar £ r 



i" 1 1— Lb Z e : Labeled £r 



r H m : LMVar £ r 



rh(e) 1IO :LIOfT 



fl-Rm: Result £ r 



The typing judgments have the standard form r H e : r, such that 
expression e has type r assuming the typing environment _T; we use 
.T for both variable and store typings. Typing rules for the special syn- 
tax nodes are shown in Listing 5. These rules are liberal on purpose. 
Recall that special syntax nodes are run-time representations of certain 
values, e.g., labeled MVars. Thus, they are only considered in a context 
where it is possible to uniquely deduce their types. The typing for the re- 
maining terms and expressions are standard and we therefore do not de- 
scribe them any further. We do not require any of the sophisticated fea- 
tures of Haskell's type-system, a direct consequence of the fact that se- 
curity checks are performed at run-time. Since typing rules are straight- 
forward, we assume that the type system is sound with respect to our 
semantics. 

The LIO monad is essentially implemented as a State monad. To sim- 
plify the formalization and description of expressions, without loss of 
generality, we make the state of the monad part of the run-time envi- 
ronment. More precisely, each thread is accompanied by a local security 
run-time environment a, which keeps track of the current label (cr.lbl) 
and clearence (er.clr) of the running LIO computation. Common to ev- 
ery thread, the symbol E holds the global LMVar store (£.<f>) and the 
output channels {E.ai, one for every security label I). A store 0 is a map- 
ping from LMVars to labeled values, while an output channel is a queue 
of events of the form out(w) (output) or exit(u) (termination), for some 
value v. For simplicity, we assume that every store contains a mapping 
for every possible LMVar, which is initially the syntax node (•). The run- 
time environments E, a, and a LIO computation form a sequential con- 
figuration (E, {a, e)). 

The relation (E, {a, e)) —*■ (E',(a',e')) represents a single evalua- 
tion step from expression e, under the run-time environments E and a, 
to expression e' and run-time environments E' and a'. We define such 
relation in terms of a structured operational semantics via evaluation 
contexts [16]. We say that e reduces to e' in one step. We write — >* for 
the reflexive and transitive closure of — Symbol a ranges over the in- 
ternal events triggered by expressions (as illustrated in Listing 6 and ex- 
plained below). We utilize internal events to communicate between the 
threads and the scheduler. Listing 6 shows the reductions rules for the 
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core contributions in our library. Rules (Lab) and (UNLAB) impose the 
same security constrains as for the sequential version of lio [46]. Rule 
(Lab) generates a labeled value if and only if the label is between the 
current label and clearance of the LIO computation. Rule (UNLAB) re- 
quires that, when the content of a labeled value is "retrieved" and used 
in a LIO computation, the current label is raised (a' = cr[lbl >-* I'], 
where /' = cr.lbl u I), thus capturing the fact that the remaining com- 
putation might depend on e. Output channels are treated as deques 
of events. We use a standard deque-like interface with operations (<) 
and (>) for front and back insertion (respectively), and we also allow 
pattern-matching in the rules as a representation of deconstruction op- 
erations. Rule (OUTPUT) adds the event out(v) to the end of the output 
channel at security level I {S.ai > out(u)). 

The main contributions of our language are related to the primitives 
for concurrency and synchronization. Rule (lFORK) allows for the cre- 
ation of a thread and generates the internal event f or k(e), where e is the 
computation to spawn. The rule allocates a new LMVar in order to store 
the result produced by the spawned thread (e »= Ax. put LMVar to x). 
Using that LMVar, the rule provides a handle to access to the thread's 
result (return (Rm)). Rule (LWAIT) simply uses the LMVar for the han- 
dle. As mentioned in Section 4, operations on LMVar are bi-directional 
and consequently the rules (NLMVAR), (TLMVAR), and (PLMVAR) require 
not only that the label of the mentioned LMVar be between the cur- 
rent label and current clearance of the thread (cr.lbl != I != a.clr), but 
that the current labe be raised appropriately. Considering the security 
level of a LMVar (I), rule (TLMVAR) accordingly raises the current label 
(cr' = cr[lbl cr.lbl u I]) when emptying (S.(f>[m >-» Lb I □]) its content 
(£.4>(m) — Lb I e). Similarly, considering the security level of a LMVar 
(I), rule (PLMVAR) accordingly raises the current label (cr' = cr[lbl >->■ 
cr.lbl u I]) when filling (E.(j>[m >->■ Lb I e]) its content (£.(f>(m) = Lb I □). 
Finally, rule (GLABR) fetches a labeled LMVar from the LMVar store (e = 
E.(j){m), i.e., a value of the form Lb I to), and returns its label. 

Listing 7 shows the formal semantics for threadpools. The relation ^ 
represents a single evaluation step for the threadpool, in contrast with 
— ► which is only for a single thread. We write ^* for the reflexive and 
transitive closure of As mentioned, configurations are of the form 
{S,t s }, where S is the global runtime environment and t s is a queue 
of sequential configurations. The front of the queue is the thread that 
is currently executing. Threads are scheduled in a round-robin fashion, 
like GHC. The thread at the front of the queue executes one step, and 
it is then moved to the back of the queue (see Rule (STEP)). If this step 

fork(e) 

involves a fork (represented by — * ), a new thread is created at the 
back of the queue (see Rule (FORK)). Threads are also moved to the back 
of the threadpool if they are blocked, e.g., waiting to read a value from 
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an empty LMVar (see Rule (NO-STEP), we define -/-> as the impossibil- 
ity to make any progress). When a thread finishes, i.e., it can no longer 
reduce, the final value is placed in the output channel indicated by the 
current label (er.lbl), and the thread is removed from the queue (see 
Rule (exit)). 

7 Security guarantees 

In this section, we show that LIO computations have the property of 
termination-sensitive non-interference. As in [30, 38, 46], we prove this 
property by using the term erasure technique. The erasure function el 
rewrites data at security levels that the attacker cannot observe into the 
syntax node •. 

Listing 8 defines the erasure function el- This function is defined 
in such a way that s L (e) contains no information above level L, i.e., 
the function el replaces all the information more sensitive than L in 
e with a hole (•). In most of the cases, the erasure function is simply 
applied homomorphically (e.g., £z,(ei e 2 ) = £z,(ei) £i(e 2 )). For thread- 
pools, the erasure function is mapped into all sequential configurations; 
all threads with a current label above L are removed from the pool 
(filter (\{a, e).e £ • ) (map e l t s ), where = denotes syntactic equiva- 
lence). The computation performed in a certain sequential configuration 
is erased if the current label is above L. For runtime environments and 
stores, we map the erasure function into their components. An output 
channel is erased into the empty channel (e) if it is above L, otherwise 
the individual output events are erased according to £j> Similarly, a la- 
beled value is erased if the label assigned to it is above L. 

Following the definition of the erasure function, we introduce a new 
evaluation relation — ► l as follows: 

(£,t s )-^L e L {{E',t' s )) 

The relation — > L guarantees that confidential data, i.e., data not below 
level L, is erased as soon as it is created. We write — > L for the reflexive 
and transitive closure of — *l- Similarly, we introduce a relation <-* L as 
follows: 

(£,t s )^(Z',t' s ) 
(S,t s )^ L e l ({S',Q) 

As usual, we write <-*-* L for the reflexive and transitive closure of ^l- 

In order to prove non-interference, we will establish a simulation re- 
lation between <->■* and <-** L through the erasure function: erasing all se- 
cret data and then taking evaluation steps in ^ L is equivalent to taking 
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steps in first, and then erasing all secret values in the resulting con- 
figuration. Note that this relation would not hold if information from 
some level above L was being leaked by the program. In the rest of this 
section, we only consider well-typed terms to ensure there are no stuck 
configurations. 

For simplicity, we assume that the space address of the memory store 
is split into different security levels and that allocation is deterministic. 
Therefore, the address returned when creating an LMVar with label I 
depends only on the LMVars with label I already in the store. For the 
sake of brevity, the proofs have been shortened in this section, but more 
details can be found in Appendix 1 . 

We start by showing that the evaluation relations — ► l and ^ l are 
deterministic. 

Proposition 1 (Determinacy of — >l). If{S,t) — *l {£' ,t') and 
(£, t) >l (E", t"), then (S',t') = (S",t"). 

Proof. By induction on expressions and evaluation contexts, showing 
there is always a unique redex in every step. 

Proposition 2 (Determinacy of ^l). If{S,t s ) ^ L (S' ,t' s ) and (S,t s ) 
(S"X),then(S\t' s ) = (S"X). 

Proof. By induction on expressions and evaluation contexts, showing 
there is always a unique redex in every step and using Proposition 1. 

The next lemma establishes a simulation between ■-»■* and <-** L . 

Lemma 1 (Many-step simulation). If (S,t s ) ^* {S\ t' s ), then 

Proof. In order to prove this result, we rely on properties of the erasure 
function, such as the fact that it is idempotent and homomorphic to the 
application of evaluation contexts and substitution. We show that the 
result holds by case analysis on the rule used to derive (S,t s ) ^* (S',t' a ), 
and considering different cases for threads whose current label is below 
(or not) level L. For more details, see Appendix 1. 

The L-equivalence relation * L is an equivalence relation between 
configurations (and their parts), defined as the equivalence kernel of the 
erasure function e L : (E,t s ) ~ L {Z',r s ) iff e L ({£, t s )) = e L ((£',r s )). If 
two configurations are L-equivalent, they agree on all data below or at 
level L, i.e., they cannot be distinguished by an attacker at level L. Note 
that two queues are L-equivalent iff the threads with current label no 
higher than L are pairwise L-equivalent in the order that they appear in 
the queue. 
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The next theorem shows the non-interference property. It essentially 
states that if we take two executions of a program with two L-equivalent 
inputs, then for every intermediate step of the computation of the first 
run, there is a corresponding step in the computation of the second run 
which results in an L-equivalent configuration. Note that this also in- 
cludes the termination channel, since L-equivalence of configurations 
requires that output channels have matching events, and termination is 
modelled as a special kind of output event. 

Theorem 1 (Termination-sensitive non-interference). Given a computa- 
tion e (with no Lb, () LI °, R, and •) where r h e : Labeled £ r -*■ 
LIO I (Labeled £ t'), an attacker at level L, an initial securiy context a, 
and runtime environments S\ and £2 where S\.4> = £2-0 = 0 and £\.ctk = 
£ 2 -ctk = e for all levels k, then 

Veie2.(.T i— ei : Labeled £ r)i = \ t 2 A ei ~l &i 

=* 3£' 2 tUZ2,(<r,ee 2 )) - (S 2 ,t 2 s ) a M,t\) ~l (S^l) 

Proof. The result follows by combining Lemma 1 and Proposition 2 (De- 
terminacy). 

8 Example Application: Dating Website 

In this section we evaluate the feasibility of leaking information through 
timing-based covert channels as well as the effectiveness and expres- 
siveness of our extensions to LIO. 

We built a simple dating website that allows third-party develop- 
ers to build applications that interact with a common database. Our 
website exposes a shared key-value store to third-party apps encoding 
interested-in relationships. A key correspond to a user ID and its associ- 
ated value represent the users that he/she is interested in. For simplicity, 
we do not consider the list of users sensitive, but interested-in relation- 
ships should remain confidential. In particular, a user should be able to 
learn which other users are interested in them, but should not be able to 
learn the interested-in relationships of other users. 

The website consists of two main components: 1) a trusted web server 
that executes apps written using LIO and 2) untrusted third-party apps 
that may interact with users and read and write to the database. The 
database is simply a list of tuples mapping keys (users) to LMVars stor- 
ing lists of users. Apps are separated from each other by URL prefixes. 
For example, the URL http://xycombinator.biz/Appl points to 
Appl. Requests with a particular app's URL prefix are serviced by in- 
voking the app's request handler in an IFC-constrained, and time-mitiga- 
ted, environemt. We assume a powerful, but realistic adversary. In par- 
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ticular, malicious application writers may themselves be users of the dat- 
ing site. We now consider the effectiveness of termination and timing 
channels in leaking the database. 

Termination covert channel As detailed in Section 3, the implementation 
of LIO [46], with toLabeled, is susceptible to a termination channel at- 
tack. In the context of our dating-website, a malicious application term, 
running on behalf of an (authenticated) user a can be used to leak infor- 
mation on another (target) user t as follows: 

► Authenticated adversary a issues a request that contains a guess that 
user t has an interest in g: get /terTO?target=t&guess=<? 

► The trusted app container invokes the app term and forwards the 
request to it. 

► The application term then executes the following LIO code: 

toLabeled T $ do v <- lookupDB t 

if g == v then 1 else return () 
return $ mkHtmlResp200 "Bad guess" 

Here, lookupDB t is used to perform a database lookup with key t. If g 
is present in the database entry, the app will not terminate, otherwise 
it will respond, denoting the guess was wrong. 

We found the termination attack to be very effective. Specifically, we 
measured the time required to reconstruct a database of 10 users to be 
73 seconds 4 . 

If toLabeled is prohibited and f orkLiO is used instead, the termina- 
tion attack cannot be mounted. This is because waitLio first raises the 
label of the app request handler. An attempt to output a response to the 
client browser will not succeed since the current label of the handler can- 
not flow to the label of the client's browser. It is important to note that 
errors of this kind are made indistinguishable from non-terminating re- 
quests. To accomplish this, our dating site catches label violation errors 
and converts them to i. 

Internal timing covert channel To carry out an internal timing attack, an 
app must execute two threads that share a common resource. Concretely, 
an app can use internal timing to leak information on a target user t as 
follows: 

► Authenticated adversary a issue a request containing a guess that t 
is interested-in g: get /internal?target=t&guess=g 

► The trusted app container invokes the app internal. 

► App internal then executes the following LIO code: 

4 All our measurements were conducted on a laptop with a Intel Core i7 2620M 
(2.7GHz) processor and 8GB of RAM, with GHC 7.4.1. 
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varHigh <- fork $ 
toLabeled T $ do 
v <- lookupDB t 

if g == v then sleep 5000 else return () 
appendToAppStorage g 
varLow <- fork $ do sleep 3000 

appendToAppStore -1 

wait varHigh 

wait varLow 

r <- readFromAppStore 

return $ mkHtmlResp200 r 

The code spawns two threads. The first reads the high value in a 
toLabeled then outputs the guess to a low-label store, however, if the 
guess is correct, it sleeps for five seconds before outputting the guess. 
The second thread simply outputs a place holder after waiting for three 
seconds. The result is that the ordering of outputs reveals whether the 
guess is correct. If the guess is incorrect, the store will read g, -l; if the 
guess is correct, the store will read -1 , g. 

We implemented a magnified version of the attack above by sending 
several requests to the server. The adversary repeatedly sends requests 
to internal for each user in the system as a guess g. As with the termi- 
nation channel attack, we found that internal timing attack is feasible. 
For a database of 10 users we managed to recover the entries in 66.92 
seconds. 

Our modifications to LIO can be used to address the internal tim- 
ing attacks described above; replacing toLabeled with f orkLiO elimi- 
nates the internal timing leaks. More generally, we observe that by us- 
ing forkLio, the time when the app writes to the persistent storage 
(appendToAppStore) cannot be influenced by sensitive data. Similarly, 
replacing fork and wait by their LIO counterparts renders the attack 
futile. 

External timing covert channel We consider a simple external timing at- 
tack to our dating website in which the adversary a has access to a high- 
precision timer. An app external colluding with a can use external tim- 
ing to leak a target user t's interested-in relationship as follows: 

► Authenticated adversary a issues requests containing the target user 
t: GET /ea;£erna^?target=i&guess=(7 

► The trusted container invokes external with the request. 

► App external then proceeds to execute the following LIO code: 

toLabeled T $ do 
v <- lookupDB t 

if g = v then sleep 5000 else return () 
return $ mkHtmlResp200 "done" 
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The attack is a component of the internal timing attack: given a target 
t and guess g, if the g was correct the thread sleeps; otherwise it does 
nothing. The attacker simply measures the response time - recognizing 
a delay as a correct guess. 

Despite its simplicity we also found this attack to be plausible. In 
33 seconds, we recovered a database of 10 users. Addressing this attack 
we mitigated the app handler, as described in Section 5. The response 
time of an app is mitigated, taking into account the arrival of a request. 
Although we manged to recover 3 of the 10 user entries in 64 seconds — 
we found that recovering the remaining user entries was infeasible. Of 
course, the performance of well-behaved apps was unaffected. 

9 Related Work 

IFC security libraries The seminal work by Li and Zdancewic [29] presents 
an implementation of information-flow security as a library using a gen- 
eralization of monads called Arrows [21]. Following this line of work, 
Tsai et al. [48] further consider side-effects and concurrency. Different 
from our approach, Tsai et al. provide termination-insensitive non-inter- 
ference under a cooperative scheduler and no synchronization primi- 
tives. Russo et al. [38] eliminate the need for Arrows by showing an 
IFC security library based solely on monads. Their library leverages 
Haskell's type-system to statically enforce non-interference. Jaskelioff 
and Russo [23] propose a library that enforces non-interference by ex- 
ecuting the program as many times as security levels, which is known 
as secure multi-execution [14]. Recently, Stefan et al. propose the use 
of the monad lio to track information-flow dynamically [46]. Morgen- 
stern et al. [32] encoded an authorization- and IFC-aware programming 
language in Agda. Their encoding, however, does not consider compu- 
tations with side-effects. Devriese and Piessens [15] used monad trans- 
formers and parametrized monads [4] to enforce non-interference, both 
dynamically and statically. None of the approaches mentioned above 
deals with the termination channel. Moreover, none of them (except 
from Tsai et al.) handle concurrency. 

Internal timing covert channel There are several approaches to deal with 
the internal timing covert channel. The work in [43-45, 50] relies on the 
non-realistic primitive protect (c) which, by definition, hides the tim- 
ing behaviour of c. Our approach, on the other hand, relies on the fork 
primitive and the semantics for mutable locations. Assuming a scenario 
where it is possible to modify the scheduler, the work in [6, 35] propose 
a novel interaction between threads and the scheduler that is able to im- 
plement a generalized version of protect (c) . A series of work [22, 47, 
52] prevents internal timing leaks by avoiding any races on public data. 
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Boudol and Castellani [8, 9] avoid internal timing leaks by disallow- 
ing public events after branching on secret data. The authors consider a 
fixed number of threads and no synchronization primitives. Russo and 
Sabelfeld [36] show how to remove internal timing leaks under a coop- 
erative scheduling by manipulating yield commands. The termination 
channel is intrinsically present under cooperative scheduling, i.e., there 
is no way to decouple executions between threads. The work by Russo 
et al. [37] is the closest one to our approach to internal timing leaks. 
In that work, the authors introduce a code transformation, from a se- 
quential program into a concurrent one, that spawns threads to execute 
branches and loops whose conditionals depend on secret values. The 
idea of spawning threads when computations use secrets is similar to 
ours, but it is used in a quite different context. Firstly, Russo et al. apply 
their technique for a simple sequential while-language, while we con- 
sider concurrent programs with synchronization primitives. Secondly, 
and different from our work, their approach does not consider leaks due 
to termination, i.e., the transformation guarantess termination-insensitive 
non-interference. Finally, incurring high synchronization costs, the code 
transformation introduces synchronization between spawned threads in 
order to preserve the semantics of the original sequential program. The 
transformation might change the terminating behavior of programs in 
order to preserve security. Our proposal, on the other hand, guarantees 
that the semantics of the program is the one that the programmer writes 
in the code. 

Termination and external covert channels There are several language-based 
mechanisms to tackle the termination and external timing channels. Vol- 
pano [49] describes a type-system that removes the termination chan- 
nel by forbidding loops whose conditional depend on secrets. The work 
by Hedin and Sands [20] avoids the termination and external timing 
covert channels for sequential Java bytecode by disallowing outputs af- 
ter branching on secrets. Similarly, lio computations do not allow pub- 
lic outputs after observing secret data. However, the programmer can 
spawn new threads to perform such computations and thus allowing 
the rest of the system to still perform public outputs. Agat [1] describes a 
code transformation that removes external timing leaks by padding pro- 
grams with dummy computations. The termination channel is closed by 
disallowing loops on secrets. One drawback of Agat's transformation 
is that if there is an if-then-else, whose guard depends on secret data, 
and only one of its branches is non-terminating, then the transformed 
program becomes non-terminating. This approach has been adapted for 
languages with concurrency [39-41]. Moreover, the transformation has 
been rephrased as a unification problem [27] as well as being imple- 
mented using transactions [5]. While targeting sequential programs, se- 
cure multi-execution [14] removes both the termination and external 
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timing channel. However, the latter is only closed if there are as many 
CPUs (or cores) as security levels being considered by the technique. We 
refer the reader to [25] for a more detailed description of possible en- 
forcements for timing- and termination-sensitive non-interference. Re- 
cently Zhang et al. [54] propose a language-based mitigation approach 
for a simple while-language extended with a mitigate primitive. Their 
work relies on static annotations to provide information about the un- 
derlying hardware. Compared to their work, our functional approach 
is more general and can be extended to address other covert channels 
(e.g., storage). However, their attack model is more powerful in consid- 
ering the effects of hardware (e.g., cache). Nevertheless, we find their 
work to be complimentary: our system can leverage static annotations 
and the Xenon "no-fill" mode to address attacks relying on underlying 
hardware. 

10 Summary 

Many information flow control systems allow applications to sequence 
code with publicly visible side-effects after code computing over private 
data. Unfortunately, such sequencing leaks private data through termi- 
nation channels (which affect whether the public side-effects ever hap- 
pen), internal timing channels (which affect the order of publicly visi- 
ble side-effects), and external timing channels (which affect the response 
time of visible side-effects). Such leaks are far worse in the presence of 
concurrency, particularly when untrusted code can spawn new threads. 

We demonstrate that such sequencing can be avoided by introducing 
additional concurrency when public values must reference the results of 
computations over private data. We implemented this idea in an existing 
Haskell information flow library, LIO. In addition, we show how our 
library is amenable to mitigating external timing attacks by quantizing 
the appearance of externally visible side-effects. To evaluate our ideas, 
we prototyped the core of a dating web site showing that our interfaces 
are practical and our implementation does indeed mitigate these covert 
channels. 
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Listing 6 Semantics for non-standard expressions. 

E ::=... | label Ee | unlabel E | out E e | out I E 
| forkLIO-Ee | newLMVar £ e | takeLMVar E 
| putLMVar E e | labelOf LMVar E 

(Lab) 

er.lbl g I g cr.clr 
{H,(a,E[label I e]}} — >• (Z, (cr, ^[return (Lb / e)])) 

(unLab) 

Z' = er.lbl uZ Z'g cr.clr ct' = cr[lbl «■ Z'] 
^[unlabel (Lb Z e)])) — ► (Z, (ct', ^[return e])) 

(OUTPUT) 

er.lbl g Z g cr.clr Z' = Z[a; i-> Z.a; > out(i>)] 
(Z, (cr, £[out Z «]}) — ► (Z', (ct, ^[return ( ) ])) 

(lFork) 

er.lbl g Z g CT.clr Z' = Z[e6 i ^ Z.e6[m Lb Z □]] 
a = e »= Ax. putLMVar m x m fresh 

{S,{a,E[forkLIOle]}} f °-^ e) {£', (ct, ^[return (Rm)])) 
(lWait) 

(Z, (CT,S[waitLIO (R m)])) — > (Z, (ct, £[takeLMVar to])} 
(nLMVar) 

er.lbl g I g cr.clr Z' = Z[rA «■ Z.0[m >->■ Lb Z e]] m fresh 
(Z, (cr, ^[newLMVar Z e])) — > (£' , (ct, ^[return m])) 

(tLMVar) 

Z.0(m) = Lb Z e er.lbl g Z g cr.clr 
cr' = cr[lbl w er.lbl u Z] Z' = E[(j> Z.0[m Lb Z □]] 

(Z,(ct, ^[takeLMVar to])) — ► (Z', (ct', £[return e]» 
(pLMVar) 

Z.^(to) = Lb Z □ er.lbl g Z g cr.clr 
ct' = cr[lbl w er.lbl u Z] Z' = Z[c6 i ^ Z.e6[m w Lb Z e]] 

(Z, (cr, ^[putLMVar me]}} — > (Z', (cr', ^[return ( ) ])) 
(gLabR) 

e = Z.e6(m) 

(Z,(er, £[labelOf LMVar to])) — ► <Z, (ct, ^[labelOf e])) 
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Listing 7 Semantics for threadpools. 



(step) (no-step) 
{S,t)^{S',t') (z,t)-h 



(S,t<t s ) (E',t s >t'\ (S,t<t s ) ^ (E,t s >t) 

(FORK) 

(S,t) — 1 } {S',{cr, e')) t new = {a,e) 
(E,t <t s ) ^ {S',t s > (<r,e') > t new ) 

(exit) 

Z = cr.lbl Z 1 ' = ^[a; -S'.ctj > exit(v)] 



Listing 8 Erasure function. 



££«!:,*.)) = <£l(S), filter (A(<r, e).e ^ •) (map e L i a )) 



,, . . _ J(cr,») cr.lbl 

£iU^e;j - | ^ ;£j .( e )) otherwise 



= Z[4>» E L {S.(j))][ai £ L (a() ^Labels 

1 map e L ai otherwise 



£l(«;) 



£ l(4>) = {(x,e L ((j)(x))) : x e dom((j))} 
£L(Lb I e) = 
In the rest of the cases, £l is homomorphic. 



Lb I • l£L 

Lb I £L(e) otherwise 
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1 Detailed proofs 

In this section, we provide more details about the proofs for the results 
in Section 7 as well as some auxiliary lemmas. 

The following lemmas are necessary to prove that — > L and ^ L are 
deterministic. 

Propositions (Determinacy of —^). If(E, t s ) —> (£' ,t' s ) and (£,t s ) —> 
{Z"X),thm{Z',t' s ) = {E"X). 

Proof. By induction on expressions and evaluation contexts, showing 
there is always a unique redex in every step. 

Proposition 1 (Determinacy of — lf(Z,t) — > L {S',t'}and 
(Z,t) >l (£",?'), then (E',t') = (27", i"). 

Proof. By Proposition 3 and definition of el- 

Proposition 4 (Determinacy of ^-). If {U,t s ) ■-»■ {S',t' s ) and (S,t s ) 
{Z"X),then{Z',t' s ) = {Z"X). 

Proof. By induction on expressions and evaluation contexts, showing 
there is always a unique redex in every step, and using Proposition 3. 

Proposition 2 (Determinacy of ^ L ). If { S, t s ) ^ L (£' ,t' s ) and (£ ,t s ) ^ L 
{E"^),then{E',t' s ) = {E",^). 

Proof. By Proposition 4 and the definition of El- 

The following proposition shows that the erasure function is homo- 
morphic to the application of evaluation contexts and substitution, and 
that it is idempotent. 

Proposition 5 (Properties of erasure function). 
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1. s L (E[e])=s L (E)[e L (e)] 

2. e L ([e 2 /x]ei) = [e L (e 2 )/x]e L (e 1 ) 

3. e L (e L (e)) = e L (e) 

4. e L (e L (E))=e L (E) 

5. e L (e L (S))=e L (E) 

6. e L (e L ({a,e))) = e L ({a,e)) 

7. e L (e L (t s )) =e L (t s ) 

8. e L (e L ({Z,t s )))=e L ({Z,t s }) 

Proof. All follow from the definition of the erasure function e L , and by 
induction on expressions and evaluation contexts. 

Most of the reduction rules in Listing 6 will change the runtime envi- 
ronment. In addition, these transformations usually depend on a given 
expression, e.g. S >->■ E[cj> >->■ £.cf)[m i-> e]] can be seen as a function of e. 
We will represent these runtime transformations as functions /i:exT-> 
17, where e is the set of expressions and E is the set of runtime environ- 
ments. We will also write h e : E -*■ E for the partial application of h to 
an expression e. We extend this notation to transformations of stores and 
output channels. 

We say that a transformation / : e x A -*■ A is L-independent if the 
secrets introduced in structure A by the application of f e cannot be ob- 
served by an attacker at level L, i.e. 

£L°fe = £L°f eL (e)°£L- 

The next lemma is useful in proving that a given environment trans- 
formation is L-independent, by showing that its corresponding store 
and output channel transformations are L-independent. 

Lemma 2. Let h e be a transformation for runtime environments that depends 
on an expression e, given as h e (E) = E[<f> >->■ f e (E .cj>)][ai >->■ gf(E.ai)] 
and thus uniquely determined by functions f e and gf for every label I and 
expression e. If f and gi are all L-independent, then h is L-independent. 

Proof. 

e L (h £L{e) (e L (S))) 
= e L {e L {E)[<j>» f eUe) {e L {E).<j>)} 

[at^g^ieUSUi)]) 
= e L (e L (E)[<f>^ f eL(e) (e L (S.<f>))] 

[<*i»g?- (e) (e L (E. ai ))]) 
= s L (e L (Z))[cP~e L (f £L{e) (e L (Z.<t>)))] 

= e L (E)[<f> - e L (f e (S.4>))][ai - e L (gf(S.ai))] 
= s L (E)[<p - fe(S.<f>)][ai - 3t(Z-ai)] 
= s L (h e (E)) 
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The next lemma shows that the environment transformations in the 
reduction rules are all L-independent. 

Lemma 3. All runtime transformations h e in the reduction rules in Listing 6 
are L-independent. 

Proof. There are two cases to consider: modifications to the store (</>), 
which only update the contents of one reference, or appending a value 
to an output channel. 

► Case h e (S) = E[<j> f e (E. </>)], with f e (<p) = <j>[m <-> Lb I e]. By 
Lemma 2, we only have to prove that / is L-independent. We con- 
sider two cases: 

• ZgL: 

e L (fs L (e)(£L(^))) 

= £L(eL((f>)[m >-»■ Lb I e L (e)]) 
= £L(£L(<f>[m Lb I e])) 
= e L (U4>)) 

• l£L: 

£L(/ £L (e)(£LO))) 

= £L(£L(4>)[m >-»■ Lb I e L (e)]) 
= £L{£L{(f))[m h> Lb I •] 
= £l(0)['71 >-> Lb I •] 
= EL(/eW) 

► Case ft.e(Z') = ^[a/ gf(E.ai)] with <?f (a) = a > e. By Lemma 2, 
we only have to prove that gi is L-independent. 

edg^ieda))) 
= £L(£L(a) > £L(e)) 
= £h(£L(a > e)) 

► The rest of the cases are similar. 

The following lemma establishes a simulation between — > and — >l 
when reducing the body of a thread whose current label is below or 
equal to level L. 

Lemma 4 (Single-step simulation for public computations). 

If(E, (a,t)\ (E',t') with a.lbl e L, then s L ((E, (a, t))) *l £l((S', t'}). 

(S,(a,e)) >{E\t) 

el el 

e L ({S,(a,e))) >e L {(£\t')) 
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Proof. The proof is by case analysis on the rule used to derive 
(S,{(j,t)) — > {E',t'\. As shown in Lemma 3, all environment modifi- 
cations are consistent with the simulation: erasing secret data and then 
modifying the environment with erased data is equivalent to modifying 
the environment and then erasing the secrets. 

► Case t = E[f orkLIO I e] 

e L ({£, (<7,.E[f orkLIO I e]))) 
= (s L (S), (a,s L (E)[f orkLIO I e L (e)]}) 

^lSlUs US 1 ), {a, e L (E)[return ()]))) 
= £ L ((£ L (r 1 ),£ L (( ( T,%eturn ()])))) 

(by Lemma 3, ^(Z" 1 ) = e L (Z")) 
= e L ({Z',{a,E[retum ()]))) 

► Case t = E[out I e] 

e L ({S,(a,E[outle]))) 
= {e L (S),(a,e L (E)[outle L (e)])) 
y L £L«£L(i: 1 ),(CT,eL(S)[return ()]))) 
= e L {{e L {U 1 ) 1 e L {{a,E[retum ()])))) 
= e L ({S',{a,E[return ()]))) 

► Case t = _B[takeLMVar to] 

e L ({£, (a,£;[takeLMVar m]))) 
= {e L (S) 7 (<T,e L (E)[take'LMVarm])} 
*l eLdeUS 1 ), (a',£ L (£)[return e L (e)]))) 
Note that now u'.lbl = I. We consider two cases: 

• l^L: 

e L ({e L (E 1 ),(o , ,e L (E)[return e L (e)]))) 
= £ L ({E',{a',E[retume]})) 

• l£L: 

e I ((£ i (i: 1 ),{ ( 7',£ L (£)[return£ L (e)])() 

= e L «Z",(o-',£;[returne]))) 
In both cases, it follows that e L (S 1 ) = £l(£") by Lemma 3. 

► Trivially reduces to the t = i?[takeLMVar to] case. 

► Case t = _E[newLMVar I e]. 

e L ((S, (o-,£;[newLMVar I e]))) 
= (o-,£L(S)[newLMVar I e])) 

y L £ L «i: 1 ,(CT,eL(S)[returnTO]») 
= ElUS 1 ,e L (((T, ^[return to])))) 
= £ L (^',((7,B[returnm])() 

► Case t = _E[putLMVar to e]. 

e L ((U, (cr,i;[putLMVar to e]))) 
= {e L (S), (a,e L (£;)[putLMVar m £ L (e)])) 
->Le I ((£ 1 ) (ff,ei(£)[return ()]))) 
= e L ((S 1 ,e L ({a,E[return ()])))) 
= £ L ({i;',(<7,£[return ()]))) 
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► The rest of the cases are similar. 

The following lemma establishes a simulation between and 
when reducing the body of a thread whose current label is below or 
equal to level L. 

Lemma 5 (Single-step simulation for public computations). If {E, (a, t) < 

t s ) (E',t' s ) with a.lbl e L, then e L {{E, (a,t) < t s }) ^ L e L ((E' ,t' s )). 

{E,(a,e)<t s )< *(2',t' s ) 



e L {{E,(a,e)<t s )Y L {(E' ,t' s )) 

Proof. The proof is by case analysis on the rule used to derive (E, (a, t) < 

t s )^{E',t' s ). 

► Case (Step). By Lemma 4, we know that e L ((E,t)) — > L s L ((E',t'}), 
so (s L (E),s L (t) < e L (t s )) ^ L e L ((e L (E'),e L (t s ) > e L {t'))). 

e L ((E,t<t s )) 
= (s L (E),e L (t)<s L (t s )) 
^ L e L ({e L (E'),s L (t s )»e L (t'))) 
= s L ((E',t s >t')) 

► Case (No-Step). 

e L ({E,t<t s )) 
= (s L (E),e L (t)<e L (t s )) 
^ L s L ({e L (E),s L (t s )>s L (t))) 
= s L ({E,t s >t)) 

► Case (Fork). 

e L ((E,(a,t)<t s )) 
= (e L (E),(a,s L (t))<e L (t s )) 
^ L e L ((e L (E'),e L (t s ) > (a,e L (t')) > t new )) 
= {s L (E'),e L (t s t>{a,t')>t riew )) 
= e L ({E',t s >{a,t')t>t new }) 

► Case (Exit). 

e L ((E,(a,v)<t s )) 
= {e L (E),(a,s L (v))<s L (t s )) 
^ L e L {{E\e L {t s ))) 
= e L ({E',t s )) 

We can also show that initial and final configurations for any reduc- 
tion steps taken from a thread above L are equal when erased. 
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Lemma 6. If(E,(a,e)) — ► {E 1 ,?) with a.lbl £ L, then e L ((E,(a,e))) 
e L {{E\t')),i.e., 



(E,(a,e)) 



(E\t'\ 



e L ((E,(a,e))) 



e L {(E\t')) 



Proof. Since £l{{E, {a, e})) = (sl(E), (a, •}), we only have to show that 
6l(E) = £ L (E x ), where E 1 is the modified environment after perform- 
ing the reduction step. The proof is similar to ^-independence for the 
simulation lemma: for an arbitrary environment transformation h e , we 
have to prove that el° h e =£l- 

► Case h e (E) = E[4> » f e (E. </>)], with f e {<j>) = 4>[m Lb I e]. We 
prove that e L o f e = s L . 

= eL(4>[m «■ Lb I e]) 
= £i(0[m «■ Lb I •]) 

► Case h e (E) = E[at h> <jrf (Z 1 .^)] with <jrf (a) = a > e. Analogous. 

Lemma 7. If (Z 1 , (<r, e) < t s ) «-»■ i^) u>if/z C7.2M £ i, t/ien El^I?, (a, e) < 
t s ))=e L ({E\t' s )),i.e., 



{E,(a,e)<t s )t- 



e L ((E,(a,e)<t s ))^=e L ((E\t' s )) 

Proof. We illustrate the proof in the case of rule (Step). Let (E, (a, e) < 

t s ) — ► (27 1 ,*, > (a, e')), then 

£i«17,<o-,e)<it 8 )) 
= < £i (^),e L (t s )) 

= £i «i: 1 ,i s >(a, e '))) 

The other cases are similar. 

The next lemma establishes a simulation between ^* and 

Lemma 1 (Many-step simulation). If (E,t s ) ^' (E' ,t' s ), then 
e L {{Z,t s ))^ L e L {{E'X)). 
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Proof. In order to prove this result, we rely on properties of the erasure 
function, such as the fact that it is idempotent and homomorphic to the 
application of evaluation contexts and substitution. 

The proof is by induction on the derivation of (S, t s ) ^* (S',t' s ). 
We consider a thread queue of the form (a, e) < r s , and suppose that 
(S, (cr,e) < r s ) ^ (S 1 ,^) and (S 1 ,^) ^* (S',t' s ) (otherwise the reduc- 
tion is not making any progress, and the result is trivial). 

► If er.lbl E L, the result follows by Lemma 5 and the induction hy- 
pothesis. 

► If er.lbl L, the result follows by Lemma 7 and the induction hy- 
pothesis. 

We can now prove the non-interference theorem. 

Theorem 2 (Termination-sensitive non-interference). Given a computa- 
tion e (with no Lb, () LI °, R, and •) where r \- e : Labeled I r -* 
LIO I (Labeled £ t'), an attacker at level L, an initial securiy context a, 
and runtime environments Si and S 2 where Si.<f> = S 2 .<j> = 0 and S\.a k = 
S 2 .a k — e for all levels k, then 

Veie2.(-T I— e, : Labeled I T) i=12 a e\ m l e 2 

Proof. Take { Si , {a, e e\ ) ) ^* ( S[ , t\ ) and apply Lemma 1 to get 
SL({Si,{a,eei})) SL({S' 1 ,t 1 s }). We know this reduction only includes 
public L) steps, so the number of steps is lower than or equal to the 
number of steps in the first reduction. 

We can always find a reduction starting from e L ({S 2 , (a, e €2))) with 
the same number of steps as £l{{Si, {a, e ei))) EL({S' 1 ,t 1 s }), so by 
the Determinacy Lemma we have El({S 2 , (ct, e e 2 ))) £l({S' 2 , t%)). By 
Lemma 1 again, we get { S 2 , ( a, e e 2 ) ) <->■* {S' 2 ,t 2 s ) and therefore ( S[ , t\ ) m l 

WAY 

2 Semantics and typing rules 

Listings 9 and 10 show the missing typing rules for the calculus. Sim- 
ilarly, Listing 11 shows the reduction rules that were not included in 
Section 6. 

3 Application: Mitigating attack on RSA 

As in [3], to highlight the effectiveness of our mitigator implementa- 
tion, we re-implement the timing attack on the OpenSSL 0.9.7 RSA im- 
plementation as originally presented in [10]. Compared to the previous 



52 



Application: Mitigating attack on RSA 



Listing 9 Typing rules for values. 



i— true : Bool 



h- false : Bool 



>-():() 



r( x ) = t 



r[x >-> Ti] H e : t 2 



r \- e ■ t —>■ t 



r h- Ax.e : Ti -»■ r 2 



r H fix e : t 



dating-website scenario, in which a malicious app deliberately delayed 
computations, the covert timing channel in this case is present due to the 
non-trivial operations performed in a decryption. Hence, an attacker can 
recover an RSA key by repeatedly requesting the RSA oracle, which may 
be a web server using SSL, to decrypt different ciphertext messages. 

Following [10], one can reveal the secret key indirectly, by recovering 
q and exposing the factorization of RSA modulus N = pq, for q < p. To 
do so, the attack proceeds as follows. Firstly, it guesses an initial value 
for q, named g, that is between 2 log 2 N / 2 and 2'°^ N / 2 + 1 / and plots the 
decryption times (in nanoseconds) of all the most significant 2-3 bits. The 
expected peak in the plot graph corresponds to our first approximation 
of q. Assuming that the most significant i + 1 bits of q have been already 
recovered, we recover the ith bit according to: 

► Set the i + 1 most significant bits (MSB) of ft to the i + 1 recovered 
MSB of q, leaving the remaining bits unset. 

► Let g hl being the same as gi but with the ith bit set. 

► Measure the time to decrypt g lr written t\. 

► Measure the time to decrypt g^i, written ti- 

► Compute A =| t 2 + t\ \. If A is large, bit i of q is unset, otherwise it 



As in [3, 10], we overcome noise due to the operating system being a 
multi-user environment by repeating the decryption for gi and gu nu- 
merous times (in our experiments, 7) and taking the median time differ- 
ence. Additionally, to build a strong indicator for the bits of q, we take 
the time difference of decrypting a neighborhood of values ft, . . . , <ft + n 
and the corresponding neighborhood of high values ghi, ■ ■ ■ ,9hi + n; in 
our experiments n — 600. 

To evaluate our Haskell mitigator implementation with the RSA at- 
tack, we extended the HsOpenSSL package with bindings for the C 
OpenSSL RSA encryption and decryption functions. On a laptop with a 
Intel Core i7 2620M (2.7GHz) processor with 8GB of RAM, we built our 
extended Haskell OpenSSL library with GHC 7.2.1, linking it against 
the C OpenSSL 0.9.7 library. The attack against a "toy" 512-bit key is 
shown Figure 1 . We only carried out the attack against the 256 MSBs as 
Coppersmith's algorithm can be used to recover the rest in an efficient 
manner [11]. As the figure shows, there is a clear distinction between 



is set. 
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♦ bit=l 




■ bit=0 




2 per. Mov. Avg. ( bit— 1 ) 




2 per. Mov. Avg. (bit— 0) 


♦ 




128 148 168 188 208 228 248 

Bits of q 



Fig. 1. Unmitigated RSA attack. Time difference is in nanoseconds. 




Fig. 2. Mitigated RSA attack. Time difference is in nanoseconds. 
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when the bits of q are 0 and 1. Finally, applying the fast-doubling time 
mitigator with an initial quantum of 500 microseconds, we bound the 
key leakage as shown by the results of Figure 2. 

4 Evaluation: Overhead of a fork 

To analyze the performance penalty in using forkLiO and waitLiO as 
opposed to toLabeled we micro-benchmarked the two approaches. As 
expected, Figure 3, the performance overhead of forking is unnoticeable. 




Fig. 3. Execution time in milliseconds for performing a f orkLlO and waitLiO 
or toLabeled. The x-axis specifies the number of operations performed inside 

the f orkLIO or toLabeled. 
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Listing 10 Typing rules for expressions. 



r H ei : Tl -»■ T2 r H e<2, ■ T\ 

r h ei e 2 : t 2 

r h ei : Bool f h e 2 : t r I- e 3 : r 
_T i- if ei then e 2 else 63 : r 

P H ei : ti /"'[a; i-> n] 1- e 2 : r 2 F 1- e : r 



_T i— let x = ei in e 2 : r 2 fi- return e : LIO £ r 

fhei : LIO£ti rhe 2 :Ti-»LIO^r 2 
r H ei »= e 2 : LIO £ r 2 

rnei:£ rhe 2 :T fh e: Labeled £ r 



T h- label ei e 2 : LIO £ (Labeled £ r) fh unlabel e : LIO £ r 

f h ei : I i"i-e 2 :LIO£T F h- e : Result I t 

r h- f orkLIO ei e 2 : LIO £ (Result It) T H waitLIO e : LIO £ r 

rnei:^ ri-e 2 :T r H ei : £ fhe 2 :T 

r i— out ei e 2 : LIO £ () fh newLMVar ei e 2 : LIO £ (LMVar I t) 

r h- e : LMVar £ r r H ei : LMVar £ r f h e 2 : t 

r H takeLMVar e : LIO £ r fh putLMVar e\ e 2 : LIO £ ( ) 

The:! 

i— getLabel : LIO 1 £ 



i— lowerClr e : LIO £ () 

T H e : Lb £ t 

v- getClearance : LIO £ i 



T\- labelOf e:£ 
r H to : LMVar £ r 



r i— labelOf LMVar m : £ 
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Listing 11 Semantics for standard constructs. 

E ::= [•] | Ee | if £ then e else e 

| return E \ E »= e 
| lowerClrB | labelOf £ | ... 

(27,<(7,£7[(Az.ei) e 2 ])) — (S,(<J,E[[e 2 /x] ei ])) 

(Z,(a,E[f±x e])) — (17,(CT,£[e (fix e)])) 

(i7,(o-, S[if true then ei else e 2 ])) — ► (17, (ct, £7[ei])) 

{E,{a,E[if false then ex else e 2 ]» — ► (17, (ct, E[e 2 ])) 

(17, (a,E[let x = e x in e 2 ])) — ► (17, (<r,£[[ei/a]e 2 ])) 

(^,(CT,£[returnw]))^(£,(CT,£[(«r°])) 

(17, (a, E[(vr° »= e 2 ])) — (17, (a, E[e 2 v})) 

CT.lbl E Z != cr.clr 
(17, (cr, £?[label Z e])) — ► {£, (ct, ^[return (Lb Z e)]» 

Z' = cr.lbl uZ Z'g cr.clr ct' = cr[lbl w Z'] 
(17, (ct, ^[unlabel (Lb Z e)])) — ► (17, (ct', ^[return e])) 

CT.lbl != Z != cr.clr cr' = cr[clr Z] 
(17,(fj,£;[lowerClr Z])) — ► (17, (ct', ^[return ()]>) 
(Z',(cr, J B[getLabel])) — »■ (17, (ct, ^[return CT.lbl])) 
(17, (ct, E[get Clearance])) — ► (17, (ct, E[re turn cr.clr])) 
(17,(CT,i7[labelOf (Lb Z e)])) — ► (£, (ct, E[l])) 
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Abstract. Information flow control allows untrusted code to ac- 
cess sensitive and trustworthy information without leaking this 
information. However, the presence of covert channels subverts 
this security mechanism, allowing processes to communicate in- 
formation in violation of IFC policies. In this paper, we show that 
concurrent deterministic IFC systems that use time-based schedul- 
ing are vulnerable to a cache-based internal timing channel. We 
demonstrate this vulnerability with a concrete attack on Hails, one 
particular IFC web framework. To eliminate this internal timing 
channel, we implement instruction-based scheduling, a new kind 
of scheduler that is indifferent to timing perturbations from un- 
derlying hardware components, such as the cache, TLB, and CPU 
buses. We show this scheduler is secure against cache-based inter- 
nal timing attacks for applications using a single CPU. To show the 
feasibility of instruction-based scheduling, we have implemented 
a version of Hails that uses the CPU retired-instruction counters 
available on commodity Intel and AMD hardware. We show that 
instruction-based scheduling does not impose significant perfor- 
mance penalties. Additionally, we formally prove that our modifi- 
cations to Hails' underlying IFC system preserve non-interference 
in the presence of caches. 
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1 Introduction 

The rise of extensible web applications, like the Facebook Platform, is 
spurring interest in information flow control (IFC) [27, 35]. Popular plat- 
forms like Facebook give approved apps full access to users' sensitive 
data, including the ability to violate security policies set by users. In con- 
trast, IFC allows websites to run untrusted, third-party apps that operate 
on sensitive user data [11, 21], ensuring they abide by security policies 
in a mandatory fashion. 

Recently, Hails [11], a web-platform framework built atop the LIO 
IFC system [39, 40], has been used to implement websites that inte- 
grate third-party untrusted apps. For example, the code-hosting web- 
site GitStar . com built with Hails uses untrusted apps to deliver core 
features, including a code viewer and wiki. GitStar relies on LIO's IFC 
mechanism to enforce robust privacy policies on user data and code. 

LIO, like other IFC systems, ensures that untrusted code does not 
write data that may have been influenced by sensitive sources to public 
sinks. For example, an untrusted address-book app is allowed to com- 
pute over Alice's friends list and display a stylized version of the list to 
Alice, but it cannot leak any information about her friends to arbitrary 
end-points. The flexibility of IFC makes it particularly suitable for the 
web, where access control lists often prove either too permissive or too 
restrictive. 

However, a key limitation of IFC is the presence of covert channels, 
i.e., "channels" not intended for communication that nevertheless allow 
code to subvert security policies and share information [22]. A great 
deal of research has identified and analyzed covert channels [25]. In 
this work, we focus on the internal timing covert channel, which occurs 
when sensitive data is used to manipulate the timing behavior of threads 
so that other threads can observe the order in which shared public re- 
sources are used [38, 43]. Though we do not believe our solution to the 
internal timing covert channel affects (either positively or negatively) 
other timing channels, such as the external timing covert channel, which 
is derived from measuring external events [1, 5, 12] (e.g., wall-clock), ad- 
dressing these channels is beyond our present scope. 

LIO eliminates the internal timing covert channel by restricting how 
programmers write code. Programmers are required to explicitly decou- 
ple computations that manipulate sensitive data from those that can 
write to public resources, eliminating covert channels by construction. 
However, decoupling only works when all shared resources are mod- 
eled. LIO only considers shared resources that are expressible by the 
programming language, e.g., shared-variables, file descriptors, sema- 
phores, channels, etc. Implicit operating system and hardware state can 
still be exploited to alter the timing behavior of threads, and thus leak in- 
formation. Reexamining LIO, we found that the underlying CPU cache 



60 



Introduction 



can be used to introduce an internal timing covert channel that leaks 
sensitive data. A trivial attack can leak data at 0.75 bits/s and, despite 
the low bandwidth, we were able to leak all the collaborators on a pri- 
vate GitStar.com project in less than a minute. 

Several countermeasures to cache-based attacks have previously been 
considered, primarily in the context of cryptosystems following the work 
of Kocher [18] (see Section 8). Unfortunately, many of the techniques 
are not designed for IFC scenarios. For example, modifying an algo- 
rithm implementation, as in the case of AES [7], does not naturally gen- 
eralize to arbitrary untrusted code. Similarly, flushing or disabling the 
cache when switching protection domains, as suggested in [6, 48], is pro- 
hibitively expensive in systems like Hails, where context switches occur 
hundreds of times per second. Finally, relying on specialized hardware, 
such as partitioned caches [29], which isolate the effects of one partition 
from code using a different partition, restricts the deployability and scal- 
ability of the solution; partitioned caches are not readily available and 
often cannot be partitioned to an arbitrary security lattice. 

This paper describes a countermeasure for cache-based attacks when 
execution is confined to a single CPU. Our method generalizes to arbi- 
trary code, imposes minimal performance overhead, scales to an arbi- 
trary security lattice, and leverages hardware features already present 
in modern CPUs. Specifically, we present an instruction-based sched- 
uler that eliminates internal timing channels in concurrent programs 
that time-slice a single CPU and contend for the same cache, TLB, bus, 
and other hardware facilities. We implement the scheduler for the LIO 
IFC system and demonstrate that, under realistic restrictions, our sched- 
uler eliminates such attacks in Hails web applications. 

Our contributions are as follows. 

► We implement a cache-based internal timing attack for LIO. 

► We close the cache-based covert channel by scheduling user-level 
threads on a single CPU core based on the number of instructions 
they execute (as opposed to the amount of time they execute). Our 
scheduler can be used to implement other concurrent IFC systems 
which implicitly assume instruction-level scheduling (e.g., [13, 14, 
32, 38, 45]). 

► We implement our instruction-based scheduler as part of the Glas- 
gow Haskell Compiler (GHC) runtime system, atop which LIO and 
Hails are built. We use CPU performance counters, prevalent on most 
modern CPUs, to pre-empt threads according to the number of re- 
tired instructions. The measured impact on performance, when com- 
pared to time-based scheduling, is negligible. 

We believe these techniques to be applicable to operating systems 
that enforce IFC, including [20, 26, 46], though at a higher cost in 
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performance for application code that is highly optimized for locality 
(see Section 5). 

► We augment the LIO [40] semantics to model the cache and formally 
prove that instruction-based scheduling removes leaks due to caches. 
The paper is organized as follows. Section 2 discusses cache-based at- 
tacks and existing countermeasures. In Section 3 presents our instruction- 
based scheduling solution. Section 4 describes our modifications to GHC's 
runtime, while Section 5 analyses their performance impact. Formal guar- 
antees and discussions of our approach are detailed in Sections 6 and 7. 
We describe related work in Section 8 and conclude in Section 9. 

2 Cache Attacks and Countermeasures 

The severity of information leakage attacks through the CPU hardware 
cache has been widely considered by the cryptographic community (e.g. 
[28, 31]). Unlike crypto work, where attackers extract sensitive informa- 
tion through the execution of a fixed crypto algorithm, we consider a 
scenario in which the attacker provides arbitrary code in a concurrent 
IFC system. In our scenario, the adversary is a developer that imple- 
ments a Hails app that interfaces with user-sensitive data using LIO li- 
braries. 

We found that, knowing only the cache size of the underlying CPU, 
we can easily build an app that exploits the shared cache to carry out an 
internal timing attack that leaks sensitive data at 0.75 bits/s. Several IFC 
systems, including [13, 14, 32, 38, 40, 45], model internal timing attacks 
and address them by ensuring that the outcome of a race to a public 
resource does not depend on secret data. Unfortunately, these systems 
only account for resources explicitly modeled at the programming lan- 
guage level and not underlying OS or hardware state, such as the CPU 
cache or TLB. Hence, even though the semantics of these systems rely 
on instruction-based scheduling (usually to simplify expressing reduc- 
tion rules), real-world implementations use time-based scheduling for 
which the formal guarantees do not hold. The instruction-based sched- 
uler proposed in this work can be used to make the assumptions of such 
concurrent IFC systems match the situation in practice. In the remain- 
der of this section, we show the internal timing attack that leverages the 
hardware cache. We also discuss several existing countermeasures that 
could be employed by Hails. 

2.1 Example cache attack 

We mount an internal timing attack by influencing the scheduling be- 
havior of threads through the cache. Consider the code shown in Fig- 
ure 1. The attack leaks the secret boolean value secret in thread 1 by 
affecting when thread 2 writes to the public channel relative to thread 3. 



62 



Cache Attacks and Countermeasures 



1. lowArray := new Array [M] ; 

2. f illArray (lowArray) 



1. 


if secret 


1. 


for i in [1 . . n] 


1. f or i in [l..n+m] 


2. 


then highArray : = new Array [M] 


2 


skip 


2. skip 


3. 


f illArray (highArray) 


3. 


readArray (lowArray ) 


3. outputLow(O) 


4. 


else skip 


4 


outputLow(l) 




thread 1 


thread 2 


thread 3 



Fig. 1. A simple cache attack. 

The program starts (lines 1-2) by 
creating and initializing a public ar- 
ray lowArray whose size M corre- 
sponds to the cache size; f illArray 
simply sets every element of the ar- 
ray to 0 (this will place the array in 
the cache). The program then spawns 
three threads that run concurrently. 
Assuming a round-robin time-based 
scheduler, the execution of the attack 
proceeds as illustrated in Figure 2, 
where secret is set to true (top) and 
false (bottom), respectively. 

► Depending on the secret value secret, thread 1 either performs a no- 
operation (skip on line 4), leaving the cache intact, or evicts lowArray 
from the cache (lines 2-3) by creating and initializing a new (non- 
public) array highArray. 

► We assume that thread 1 takes less than n steps to complete its execu- 
tion — a number that can be determined experimentally; in Figure 2, 
n is four. Hence, to allow all the effects on the cache due to thread 1 to 
settle, thread 2 delays its computation by n steps (lines 1-2). Subse- 
quently, the thread reads every element of the public array lowArray 
(line 3), and finally writes 1 to a public output channel (line 4). Cru- 
cial to carrying out the attack, the duration of thread 2's reads (line 3) 
depends on the state of the cache: if the cache was modified by thread 
1 , i.e., secret is true, thread 2 needs to wait for all the public data to 
be retrieved from memory (as opposed to the cache) before produc- 
ing an output. This requires evicting highArray from the cache and 
fetching lowArray, a process that takes a non-negligible amount of 
time. However, if the cache was not touched by thread 1 , i.e., secret 
is false, thread 2 will get few cache misses and thus produce its out- 
put with no delay. 

► We assume that thread 2 takes less than m, where m<n, steps to com- 
plete reading lowArray (line 3) when the reads hit the cache, i.e., 
lowArray was not replaced by highArray. Like n, this metric can be 
determined experimentally; in Figure 2, m is three. Using this, thread 
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Fig. 2. Execution of the cache attack 
with secret true (top) and false 
(bottom). 
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3 simply delays its computation by n+m steps (lines 1-2) and then 
writes 0 to a public output channel (line 3). The role of thread 3 is 
solely to serve as a baseline for thread 2's output: producing its out- 
put before thread 2 when the latter is filling the cache, i.e., secret is 
true; conversely, it produces an output after thread 2 if thread 1 did 
not touch the cache, i.e., secret is false. 

We remark that the race between thread 2 and thread 3 to write to a 
shared public channel, influenced by the cache state, is precisely what 
facilitates the attack. We described how to leak a single bit, but the attack 
can easily be magnified by wrapping it in a loop. Note also that we have 
assumed the attacker has complete control of the cache — i.e., the cache is 
not affected by other code running in parallel. However, the attack is still 
plausible under weaker assumptions so long as the attacker deals with 
the additional noise, as exemplified by the timing attacks on AES [28]. 

2.2 Existing countermeasures 

The internal timing attack arises as a result of cache effects influencing 
thread-scheduling behavior. Hence, one series of countermeasures ad- 
dresses the problem through low-level CPU features that provide better 
control of the cache. 

Flushing the cache Naively, we can flush the cache on every context switch. 
In the context of Figure 1, this guarantees that, when thread 2 executes 
the readArray instruction, its duration is not affected by thread 1 evict- 
ing lowArray from the cache — the cache will always be flushed on a con- 
text switch, hence thread 3 will always write to the output channel first. 

No-fill cache mode Several architectures, including Intel's Xeon and Pen- 
tium 4, support a cache no-fill mode [15]. In this mode, read/write hits 
access the cache; misses, however, read from and write to memory di- 
rectly, leaving the cache unchanged. As considered by Zhang et al. [48], 
we can execute all threads that operate on non-public data in this mode. 
This approach guarantees that sensitive data cannot affect the cache. 
Unfortunately, threads operating on non-public data and relying on the 
cache will suffer from performance degradation. 

Partitioned cache Another approach is to partition the cache according 
to the number of security levels, as suggested in [48]. Using this archi- 
tecture, a thread computing on secret data only accesses the secret parti- 
tion, while a thread computing on public data only access the public one. 
This approach effectively corresponds to giving each differently-labeled 
thread access to its own cache and, as a result, the scheduling behavior 
of public threads cannot be affected by evicting data from the cache. 
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Unfortunately, none of the aforementioned solutions can be used 
in systems built with Hails (e.g., GitStar). Flushing the cache is pro- 
hibitively expensive for preemptive systems that perform a context switch 
hundreds of times per second — the impact on performance would gravely 
reduce usability. The no-fill mode solution is well suited for systems 
wherein the majority of the threads operate on public data. In such cases, 
only threads operating on sensitive data will incur a performance penalty. 
However, in the context of Hails, the solution is only slightly less ex- 
pensive than flushing the cache. Hails threads handle HTTP requests 
that operate on individual (non-public) user data, hence most threads 
will not be using the cache. Another consequence of threads handling 
differently-labeled data is that partitioned caches can only be used in a 
limited way (see Section 8). Specifically, to address internal timing at- 
tacks, it is required that we partition the cache according to the number 
of security levels in the lattice. Given that most existing approaches can 
only partition caches up to 16-ways at the OS level [24], and fewer at the 
hardware level, an alternative scalable approach is necessary. Moreover, 
neither flushing nor partitioning the cache can handle timing perturba- 
tions arising from other pieces of hardware such as the TLB, buses, etc. 

3 Instruction-based Scheduling 

As the example in Figure 2 shows, races to acquire public resources are 
affected by the cache state, which in turn might be affected by secret val- 
ues. It is important to highlight that the number of instructions executed 
in a given quantum of time might vary depending on the state of the 
cache. It is precisely this variability that reintroduces dangerous races 
into systems. However, the actual set of instructions executed is not af- 
fected by the cache. Hence, we propose scheduling threads according 
to the number of instructions they execute, rather than the amount of 
time they consume. The point at which a thread produces an output (or 
any other visible operation) is determined according to the number of 
instructions it has executed, a measurement unaffected by the amount 
of time it takes to perform a read/write from memory. 

Consider the code in Figure 1 executing atop an instruction-based 
scheduler. An illustration of this is shown in Figure 3. For simplicity of 
exposition, the instruction granularity is at the level of commands (skip, 
readArray, etc.) and therefore context switches are triggered after one 
command gets executed. (In Section 4, we describe a more practical and 
realistic instruction-based scheduler.) Observe that the amount of time 
it takes to execute an instruction has not changed from the time-based 
scheduler of Figure 2. For example, readArray still takes 6 units of time 
when secret is true, and 2 when it is false. Unlike Figure 2, however, 
the interleaving between thread 2 and thread 3 did not change depend- 
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Fig. 3. Execution of cache attack program of Figure 1 with secret set to true (top) 
and false (bottom). In both executions, we highlight that the threads execute one 
"instruction" at a time in a round-robin fashion. The concurrent threads take the 
same amount of time to complete execution as in Figure 2. However, since we 
use instructions to context switch threads, the interleaving between thread 2 or 

3 is not influenced by the actions in thread 1 , and thus the internal timing attack 
does not arise — the threads' output order cannot encode sensitive data. 

ing on the state of the cache (which did change according to secret). 
Therefore, a race to write to the public channel between thread 2 and 
thread 3 cannot be caused by the secret, through the cache. The sec- 
ond thread always executes n+1 = 5 instructions before writing 1 to 
the public channel, while the third thread always executes n+m+1 = 8 
instructions before writing 0. 

Our proposed countermeasure, the implementation of which is de- 
tailed in Section 4, eliminates the cache-based internal timing attacks 
without sacrificing scalability and with a minor performance impact. 
With instruction-based scheduling, we do not require flushing of the 
cache. In this manner, applications can safely utilize the cache to retain 
most of their performance without giving up system security, and un- 
like current partitioned caches, we can scale up to consider arbitrarily 
complex lattices. 

4 Implementation 

We implemented an instruction-based scheduler for LIO. In this section, 
we describe this implementation and detail some key design features we 
believe to be useful when modifying concurrent IFC systems to address 
cache-based timing attacks. 

4.1 LIO and Haskell 

LIO is a Haskell library that exposes concurrency to programmers in the 
form of "green," lightweight threads. Each LIO thread is a native Haskell 
thread that has an associated security level (label) which is used to track 
and control the flow of information to /from the thread. LIO relies on 
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Haskell libraries for creating new threads and the runtime system for 
managing them. 

In general, M lightweight Haskell threads may concurrently execute 
on iV OS threads. (It is common, however, for multiple Haskell threads 
to execute on a single OS thread, i.e., M : 1 mapping.) The Haskell run- 
time, as implemented by the GHC system, uses a round-robin sched- 
uler to context switch between concurrently executing threads. Specifi- 
cally, the scheduler is invoked whenever a thread blocks / terminates or 
a timer signal alarm is received. The timer is used to guarantee that the 
scheduler is periodically executed, allowing the runtime to implement 
preemptive scheduling. 

4.2 Instruction-based scheduler 

As previously mentioned, timing-based schedulers render systems, such 
as LIO, vulnerable to cache-based internal timing attacks. We implement 
our instruction-based scheduler as a drop-in replacement for the exist- 
ing GHC scheduler, using the number of retired instructions to trigger a 
context switch. 

Specifically, we use performance monitoring units (PMUs) present in 
almost all recent Intel [15] and AMD [3] CPUs. PMUs expose hardware 
performance counters that are typically used by developers to optimize 
code — they provide metrics such as the number of cache misses, instruc- 
tions executed per cycle, branch mispredictions, etc. Importantly, PMUs 
also provide a means for counting the number of retired instructions. 

Using the perfmon2 [9] Linux monitoring interface and helper user- 
level library libpfm4, we modified the GHC runtime to configure the 
underlying PMU to count the number of retired instructions the Haskell 
process is executing. Specifically, with perfmon2 we set a data perfor- 
mance counter register to 2 64 + n, which the CPU increments upon retir- 
ing an instruction. 1 Once the counter overflows, i.e., n instructions have 
been retired, perfmon2 is sent a hardware interrupt. In our implemen- 
tation, we configured perfmon2 to handle the interrupt by delivering a 
signal to the GHC runtime. 

If threads share no resources, upon receiving a signal, the executing 
Haskell thread can immediately save its state and jump to the scheduler. 
However, preempting a thread which is operating on a shared memory 
space can be dangerous, as the thread may have left memory in an in- 
consistent state. (This is the case for many language runtimes, not solely 
GHC's.) To avoid this, GHC produces code that contains safe points where 
threads may yield. Hence, a signal does not cause an immediate preemp- 
tion. Instead, the signal handler simply sets a flag indicating the arrival 

1 Though the bit-width of the hardware counters vary (they are typically 40-bits 
wide) perfmon2 internally manages a 64-bit counter. 
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of a signal; at the next safe point, the thread "cooperatively" yields to 
the scheduler. 

To ensure liveness, we must guarantee that given any point in execu- 
tion, a safe point is reached in n instructions. Though GHC already in- 
serts many safe points as a means of invoking the garbage collector (via 
the scheduler), tight loops that do not perform any allocation are known 
to hang execution [10]. Addressing this eight-year old bug, which would 
otherwise be a security concern in LIO, we modified the compiler to in- 
sert safe points on function entry points. This modification, integrated 
in the mainline GHC, has almost no effect on performance and only a 
7% bloat in average binary size. 

4.3 Handling IO 

Threads yield at safe points in their execution paths as a result of a 
retired instruction signal. However, there are circumstances in which 
threads would like to explicitly yield prior to the reception of a retired 
instruction signal. In particular, when a thread performs a blocking op- 
eration, it immediately yields to the scheduler, registering itself to wake 
up when the operation completes. Thus, any IO action is a yield which 
allows the thread to give up the rest of its scheduling quantum. 

While yields are not intrinsically unsafe, it is not safe to allow the 
leftover scheduling quantum to be passed on to the next thread. Thus, 
after running any asynchronous IO action, the runtime must reset the re- 
tired instruction counter. Hence, whenever a thread enters the scheduler 
loop due to being blocked, we reset the retired instruction counter. 

5 Performance Evaluation 

We evaluated the performance of instruction-based scheduling against 
existing time-based approaches using the nofib benchmark suite [30]. 
not ib is the standard benchmarking suite used for measuring the perfor- 
mance of Haskell implementations. 

In our experimental setup, we used the latest development version of 
GHC (the Git master branch as of November 6, 2012). The measurements 
were taken on the same hardware as Hails [11]: a machine with two 
dual-core Intel Xeon E5620 (2.4GHz) processors, and 48GB of RAM. 

We first needed to find an instruction budget — number of instruc- 
tions to retire before triggering the scheduler. We found a poorly chosen 
instruction budget could increase runtime by 100%. To determine a good 
parameter, we measured the mean time between retired-instruction sig- 
nals with an initially guessed instruction budget parameter. We then 
adjusted the parameter so the median test program had a 10 millisec- 
ond mean time-slice (the default quantum size in vanilla GHC with 
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Fig. 4. Mean time between timer signal and retired-instruction signal. Each point 
represents a program from nofib, which have been sorted on the a-axis by their 
mean time. 
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Fig. 5. Change to run time from instruction-based scheduling 

time-based scheduling) and verified our final choice by re-running the 
measurements. For our specific setup, an instruction budget of approx- 
imately 37,100,000 retired-instructions corresponded to a 10 millisec- 
ond time quantum. We plot the mean and standard deviation across all 
nofib applications with the final tuning parameter in Figure 4. We found 
that most programs receive a signal within 2 milliseconds of when they 
would have normally received the signal using the standard time-based 
scheduler. While the instruction budget parameter will vary across ma- 
chines, it is relatively simple to bootstrap this parameter by performing 
these measurements at startup and tuning the budget accordingly. 

Next, we compared the performance of Haskell's timer-based sched- 
uler with our instruction-based scheduler. We used a subset of the nofib 
benchmark suite called the real benchmark, which consists of "real world 
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programs", as opposed to synthetic benchmarks (however, results for 
the whole nofib suite are comparable). Figure 5 shows the run time of 
these programs with both scheduling approaches. With an optimized 
instruction budget parameter, instruction-based scheduling has no im- 
pact to the runtime of the majority of nofib applications and results in 
only a very slight increase in runtime for others (about 1%). 

This result may seem surprising: instruction-based scheduling pur- 
posely punishes threads with good data locality, so one might expect a 
more substantial performance impact. We hypothesize that this is the 
case due to two reasons. First, with preemptive scheduling, we are al- 
ready inducing cache misses when we switch from running one thread 
to another — instruction-based scheduling only perturbs when these pre- 
empts occur, and as seen in Figure 4, these perturbations are very mi- 
nor. Second, modern L2 caches are quite large, meaning that hardware 
is more forgiving of poor data locality — an effect that has been measured 
in the behavior of stock lazy functional programs [2]. 

6 Cache-aware semantics 

In this section we recall relevant design aspects of LIO [40] and ex- 
tend the original formalization to consider how caches affect the tim- 
ing behavior of programs. Importantly, we formalize instruction-based 
scheduling and show how it removes cache-based internal timing covert 
channels. 

6.1 LIO Overview 

At a high level, LIO provides the LIO monad, which is used in place 
of 10. Wrapping standard Haskell libraries, LIO exports a collection of 
functions that untrusted code may use to access the filesystem, network, 
shared variables, etc. Unlike the standard libraries, which usually return 
10 actions, these functions return actions in the LIO monad, thus allowing 
LIO to perform label checks before executing a potentially unsafe action. 

Internally, the LIO monad keeps track of a current label, L cut . The cur- 
rent label is effectively a ceiling over the labels of all data that the current 
computation may depend on. This label eliminates the need to label in- 
dividual definitions and bindings: symbols in scope are (conceptually) 
labeled with L cur . 2 Hence, when a computation C, with current label 
L c , observes an object labeled L 0 , C's label is raised to the least upper 
bound or join of the two labels, written L c u L Q - Importantly the current 
label governs where the current computation can write, what labels may 

2 As described in [39], LIO does, however, allow programmers to heteroge- 
neously label data they consider sensitive. 
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be used when creating new channels or threads, etc. For example, after 
reading O, the computation should not be able to write to a channel K 
if Lq is more restricting than Lk — this would potentially leak sensitive 
information (about O) into a less sensitive channel. 

Note that an LIO computation can only execute a sub-computation 
on sensitive data by either raising its current label or forking a new 
thread in which to execute this sub-computation. In the former case, 
raising the current label prevents writing to less sensitive endpoints. In 
the latter case, to observe the result (or timing and termination behav- 
ior) of the sub-computation the thread must wait for the forked thread 
to finish, which first raises the current label. A consequence of this de- 
sign is that differently-labeled computations are decoupled, which, as 
mentioned in Section 1, is key to eliminating the internal timing covert 
channel. 

In the next subsection, we will outline the semantics for a cache- 
aware, time-based scheduler where the cache attack described in Sec- 
tion 2 is possible. Moreover, we show that we can easily adapt this se- 
mantics to model the new LIO instruction-based scheduler. 

6.2 Cache-aware semantics 

We model the underlying CPU cache as an abstract memory shared 
among all running threads, which we will denote with the symbol £. 
Every step of the sequential execution relation will affect ( according to 
the current instruction being executed, the runtime environment, and 
the existing state of the cache. As in [40], each LIO thread has a thread- 
local runtime environment a, which contains the current label a. lbl. The 
global environment E, common to all threads, holds references to shared 
resources. 

In addition, we explicitly model the number of machine cycles taken 
by a single execution step as a result of the cache. Specifically, the tran- 
sition £ -r^' a ' e ^ Q' captures the parameters that influence the cache (E, 
a, and e) as well as the number of cycles k it takes for the cache to be 
updated. 

A cache-aware evaluation step is obtained by merging the reduction 
rule of LIO with our formalization of CPU cache as given below: 

{£,(*,e))l {£',(*', e')) C^f^V k>\ 
(£, (a,e)) c {£',(&', e'))c 

We read (E, (a, e))^ — {£' , (a', e'))^ as "the configuration (E, (<r, e)) 
reduces to 

(E',{a',e'}) in one step, but k machine cycles, producing event 7 and 
modifying the cache from £ to As in LIO [40], the relation {E, (a, e)) 
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(STEP) 

{E,(a,e)) c ^ k {E',(a',e')) c g>0 
(E, C, g, (a, e) < i s ) - C', ? + fc, (</, e) < t s ) 

(PREEMPT) 

g<0 

{Z,t,q,t<t B )^{i:',S,qi,ts»t) 

Fig. 6. Semantics for threadpools under round-robin time-based scheduling 

(E',{a',e')) represents a single execution step from thread expression e, 
under the run-time environments E and a, to thread expression e' and 
run-time environments E' and a'. Events are used to communicate in- 
formation between the threads and the scheduler, e.g., when spawning 
new threads. 

Figure 6 shows the most important rules of our time-based sched- 
uler in the presence of cache effects. We elide the rest of the rules for 
brevity. The relation represents a single evaluation step for the pro- 
gram threadpool, in contrast with — ► which is only for a single thread. 
Configurations are of the form (E, £, q, t s ), where q is the number of cy- 
cles available in the current time slice and t s is a queue of thread con- 
figurations of the form (a, e). We use a standard deque-like interface 
with operations < and > for front and back insertion, respectively, i.e., 
(a,e) < t s denotes a threadpool in which the first thread is (a, e) while 
t s > (a, e) indicates that {a, e) is the last one. 

As in LIO, threads are scheduled in a round-robin fashion. Our sched- 
uler relies on the number of cycles that each step takes; we respectively 
write qi and q as the initial and remaining number of cycles assigned to 
a thread in each quantum. In rule (STEP), the number of cycles k that the 
current instruction takes is reflected in the scheduling quantum. Conse- 
quently, threads that compute on data that is not present in the cache 
will take more cycles, i.e., have a higher k, so they will run "slower" be- 
cause they are allowed to perform fewer reduction steps in the remain- 
ing time slice. In practice, this permits attacks, such as that in Figure 1, 
where the interleaving of the threads can be affected by sensitive data. 
Rule (Preempt) is used when the thread has exhausted its cycle budget, 
triggering a context switch by moving the current thread to the end of 
the queue. 

We can adapt this semantics to reflect the behavior of the new instruc- 
tion-based scheduler. To this end, we replace the number of cycles q with 
an instruction budget; we write b, for the initial instruction budget and b 
for the current budget. Crucially, we change rule (STEP) into rule (STEP- 
C A), given by 
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(Step-CA) 

{E,((T,e}) c ^ k {E',(a',e')) c b>0 
(E, C, b, (a, e)<t s )^ {E', (', b + 1, (a', e') <t a )' 

Rule (Step-CA) executes a sequential instruction in the current 
thread, provided the instruction budget is not empty (b > 0), and up- 
dates the cache accordingly 

((E, (a, e))^ — >k {E', (a', e'))^)- It is important to remark that the effects 
of the underlying cache £, as indicated by k, are intentionally ignored by 
the scheduler. This subtle detail captures the essence of removing the 
cache-based internal timing channel. (Our formalization of a time-based 
scheduler does not ignore k and thus is vulnerable.) Similarly rule (PRE- 
EMPT) turns into rule (Preempt-C A), where q and g, are respectively re- 
placed with b and b t to reflect the fact that there is an instruction budget 
instead of a cycle count. The rest of the rules can be adapted in a straight- 
forward manner. Our rules have the invariant that the instruction bud- 
get gets decremented by one when a thread executes one instruction. 

By changing the cache-aware semantics in this way we obtain a gen- 
eralized semantics for LIO. In fact, the previous semantics for LIO [40], 
is a special case, with bi = 1, i.e., the threads perform only one reduc- 
tion step before a context-switch happens. In addition, it is easy to ex- 
tend our previous termination-sensitive non-interference result to the 
instruction-based semantics. The security guarantees of our approach 
are stated below. 

Theorem 1 (Termination-sensitive non-interference). Given a program 
function f, an attacker that observes data at level L, and a pair of inputs e\ and 
e 2 indistinguishable to the attacker, then for every reduction sequence starting 
from /(ei) there is a corresponding reduction sequence starting from f{ei) 
such that both sequences reach indistinguishable configurations. 

Proof Sketch: Our proof relies on the term erasure technique as used in [23, 
34, 39], and follows in a similar fashion to that of [40]. More details can 
be found in Appendix 1. 

7 Limitations 

This section discusses some limitations of our current implementation, 
the significance of these limitations, and how the limitations can be ad- 
dressed. 

Nondeterminism in the hardware counters While the retired-instruction 
counter should be deterministic, in most hardware implementations there 
is some degree of nondeterminism. For example, on most x86 processors 
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the instruction counter adds an extra instruction every time a hardware 
interrupt occurs [44]. This anomaly could be exploited to affect the be- 
havior of an instruction-based scheduler, causing it to trigger a signal 
early. However, this is only a problem if a high thread is able to cause 
a large number of hardware interrupts in the underlying operating sys- 
tem. In the Hails framework, attackers can trigger interrupts by forcing a 
server to frequently receive HTTP responses, i.e., trigger a hardware in- 
terrupt from the network interface card. Hails, however, provides mech- 
anisms to mitigate the effects of external events, using the techniques 
of [4, 47], that can reduce the frequency of such operations. Neverthe- 
less, the feasibility of such attacks is not directly clear and left as future 
work. 

Scheduler and garbage collector instruction counts For performance rea- 
sons, we do not reset the retired-instruction counter prior to re-entering 
user code. This means that instruction counts include the instructions 
executed from when the previous thread received the signal, to when 
the previous thread yields, to when the next thread is scheduled. While 
this suggests that thread are not completely isolated, we think that this 
interaction is extremely difficult to exploit. This is because the number of 
instructions it takes for the scheduler to schedule a new thread is essen- 
tially fixed, and the "time to yield" for any code is highly dependent on 
the compiler, which we assume is not under the control of an adversary. 

Parallelism Unfortunately, we cannot simply run instruction-based sche- 
duling on multiple cores. Threads running in parallel will be able to 
race to public resources. Under normal conditions, such races can be 
still influenced by the state of the (L3) cache. Some parallelism is, how- 
ever, possible. For instance, we can extend the instruction-based sched- 
uler to parallelize regions of code that do not share state or have side 
effects (e.g., synchronization operations or writes to channels). To this 
end, when a thread wishes to perform a side effect, it is required that all 
the other threads lagging behind (as per retired-instruction count) first 
complete the execution of their side effects. Hence, an implementation 
would rely on a synchronization barrier whenever a side-effecting com- 
putation is executed; at the barrier, the execution of all the side effects 
is done in a pre-determined order. Although we believe that this "opti- 
mization" is viable, we have not implemented it, since it requires major 
modifications to the GHC runtime system and the performance gains 
due to parallelism requiring such strict synchronization barriers are not 
clear. We leave this investigation to future work. 

Even without built-in parallelism, we believe that instruction-based 
scheduling represents a viable and deployable solution when consid- 
ering modern web applications and data-centers. In particular, when an 
application is distributed over multiple machines, these machines do not 
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share a processor cache and thus can safely run the application concur- 
rently. Attacks which involve making these two machines access shared 
external resources can be mitigated in the same fashion as external tim- 
ing attacks [4, 40, 47, 48]. Load-balancing an application in this manner 
is already a well-established technique for deploying applications. 

8 Related work 

Impact of cache on cryptosystems Kocher [18] was one of the first to con- 
sider the security implications of memory access-time in implementa- 
tions of cryptographic primitives and systems. Since then, several at- 
tacks (e.g., [28, 31]) against popular systems have successfully extracted 
secret keys by using the cache as a covert channel. As a countermea- 
sure, several authors propose partitioning the cache (e.g., [29]). Until re- 
cently, partitioned caches have been of limited application in dynamic 
information flow control systems due to the small number of partitions 
available. The recent Vantage cache partition scheme of Sanchez and 
Kozyrakis [37], however, offers tens to hundreds of configurable par- 
titions and high performance. As hardware is not yet available with 
Vantage, it is hard to evaluate its effectiveness for our problem domain. 
However, we expect it to be mostly complimentary to our instruction- 
based scheduler. Specifically, a partitioned cache can be used to safely 
run threads in parallel, each group of threads using instruction-based 
schedulers. Other countermeasures (e.g., [28]) are primarily implementa- 
tion-specific, and, while applicable to cryptographic primitives, they do 
not easily generalize to arbitrary code. 

Language-based information-flow security Several works (e.g., [13]) con- 
sider systems that satisfy possibilistic non-interference [38], which states 
that a concurrent program is secure iff the possible observable events do 
not depend on sensitive data. An alternative notion, probabilistic non- 
interference, considers a concurrent program secure iff the probability 
distribution over observable events is not affected by sensitive data [43]. 
Zdancewic and Myers introduce observational low-determinism [45], which 
intuitively states that the observable behavior of concurrent systems 
must be deterministic. After this seminal work, several authors improve 
on each other's definitions on low-determinism (e.g., [14]). Other IFC 
systems rely on deterministic semantics and a determined class of run- 
time schedulers (e.g., [32]). 

The lines of work mentioned above assume that the execution of a 
single step is performed in a single unit of time, corresponding to an 
instruction, and show that races to publicly-observable events cannot 
be influenced by secret data. Unfortunately, the presence of the cache 
breaks the correspondence between an instruction and a single unit of 
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time, making cache attacks viable. Instruction-based scheduling could 
be seen as a necessary component in making the previous concurrent 
IFC approaches practical. 

Agat [1] presents a code transformation for sequential programs such 
that both code paths of a branch have the same memory access pattern. 
This eliminates timing covert channels, even those relying on the cache. 
This transformation has been adapted by several authors (e.g., [36]). 
This approach, however, focuses on avoiding attacks relying on the data 
cache, while leaving the instruction cache unattended. 

Russo and Sabelfeld [33] consider non-interference for concurrent 
systems under cooperative and deterministic scheduling. An implemen- 
tation of such a system was presented by Tsai et al. in [41]. This approach 
eliminates internal timing leaks, including those relying on the cache, 
by restricting the use of yields. Cooperative schedulers are intrinsically 
vulnerable to attacks that use termination as a covert channel. In con- 
trast, our solution is able to safely preempt non-terminating computa- 
tions while guaranteeing termination-sensitive non-interference. 

Secure multi-execution [8] preserves confidentiality of data by exe- 
cuting the same sequential program several times, one for each security 
level. In this scenario, the cache-based covert channel can only be re- 
moved in specific configurations [16]. Zhang et al. [48] provide a method 
to mitigate external events when their timing behavior could be affected 
by the underlying hardware. This solution is directly applicable to our 
system when considering external events. Similar to our work, they con- 
sider an abstract model of the hardware machine state which includes a 
description of time. However, their semantics focus on sequential pro- 
grams, wherein attacks due to the cache arise in the form of externally 
visible events. 

Hedin and Sands [12] present a type-system for preventing external 
timing attacks for bytecode. Their semantics is augmented to incorpo- 
rate history, which enables the modeling of cache effects. We proceed 
in a similar manner when extending the original LIO semantics [40] to 
consider caches. 

System security In order to achieve strong isolation, Barthe et al. [6] 
present a model of virtualization which flushes the cache upon switch- 
ing between guest operating systems. Different from our scenario, flush- 
ing the cache in such scenarios is common and does not impact the 
already-costly context-switch. 

Allowing some information leakage, Kopft et al. [19] combines ab- 
stract interpretation and quantitative information-flow to analyze leak- 
age bounds for cache attacks. Kim et al. [17] propose StealthMem, a 
system level protection against cache attacks. StealthMem allows pro- 
grams to allocate memory which does not get evicted from the cache. 
In fact, this approach could be seen as a software-level partition of the 
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cache. StealthMem is capable of enforcing confidentiality for a stronger 
attacker model than ours, i.e., they consider programs with access to 
wall-clock and perhaps running on multi-cores. As other works on par- 
tition caches, StealthMem does not scale to scenarios with arbitrarily 
complex security lattices. 

Performance monitoring counters The use of PMUs for tasks other than 
performance monitoring is a relatively recent one. Vogl and Ekert [42] 
also use PMUs, but for monitoring applications running within a vir- 
tual machine, allowing instruction level monitoring of all or specific in- 
structions. While the mechanism is the same, our goals are different: we 
merely seek to replace interrupts generated by a clock-based timer with 
interrupts generated by hardware counters; their work introduces new 
interrupts that trigger vmexits. This causes a considerable slowdown, 
while we achieve no major performance impact. 

9 Conclusion 

Cache-based internal timing attacks constitute a practical set of attacks. 
We present instruction-based scheduling as a solution to remove such 
attacks. Different from simply flushing the cache on a context switch or 
partitioning the cache, this new class of schedulers also removes timing 
perturbations introduced by other components of the underlying hard- 
ware (e.g., the TLB, CPU buses, etc.). To demonstrate the applicability 
of our solution, we implemented a scheduler using the CPU retired- 
instruction counters available on commodity Intel and AMD hardware. 
We integrated the scheduler into the Hails IFC web framework, replac- 
ing the timing-based scheduler. This integration was, in part, possible 
because of the scheduler's negligible performance impact and, in part, 
due to our formal guarantees. Specifically, by generalizing previous re- 
sults, we proved that instruction-based scheduling for LIO preserves 
confidentiality and integrity of data, i.e., termination-sensitive non-in- 
terference. Finally, we remark that our design, implementation, and proof 
are not limited to LIO; we believe that instruction-based scheduling is 
applicable to other concurrent deterministic IFC systems where cache- 
based timing attacks could be a concern. 
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1 Formalization of LIO with instruction-based 
scheduling 

LIO is formalized as a simply typed Curry-style call-by-name A-calculus 
with some extensions. Figure 7 defines the formal syntax for the lan- 
guage. Syntactic categories v, e, and t represent values, expressions, and 
types, respectively. 

The values in the calculus have their usual meaning for typed A- 
calculi. Symbol m represents LMVars. Special syntax nodes are added to 
this category: Lb v e, (e) LI °, R m, and 0. Node Lb v e denotes the run-time 
representation of a labeled value. Similarly, node (e) LID denotes the run- 
time result of a monadic LIO computation. Node 0 denotes the run-time 
representation of an empty LMVar. Node R m is the run-time representa- 
tion of a Result, implemented as a LMVar, that is used to access the result 
produced by spawned computations. 



Label 
LMVar 
Value 



Expression: 



Type: 



I 

rn 



true | false | () | I \ m \ x | Xx.e 

| fixe | Lb/ e | (e) LID | m | Rm 
v | • | e e | if e then e else e 

let x = e in e | return e | e »= e 
label e e | unlabel e | getLabel 
labelOf e | lFork e e | lWait e 
newLMVar e e | takeLMVar e 
putLMVar e e | labelOf LMVar e 
Bool | () | t-s-t | t | Labeled £r 
I Result £ r I LMVar £ r I LIO £ r 



Fig. 7. Syntax for values, expressions, and types. 



Expressions are composed of values (v), the special node •, repre- 
senting an erased term, function applications (e e), conditional branches 
(if e then e else e), and local definitions (let i = ein e). Additionally, 
expressions may involve operations related to monadic computations 
in the LIO monad. More precisely, return e and e »= e represent the 
monadic return and bind operations. Monadic operations related to the 
manipulation of labeled values inside the LIO monad are given by label 
and unlabel. Expression unlabel e acquires the content of the labeled 
value e while in an LIO computation. Expression label e\ e 2 creates a 
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labeled value, with label e\, of the result obtained by evaluating the LIQ 
computation e%. Expression lFork e\ €2 spawns a thread that computes 
e2 and returns a handle with label e\. Expression lWait e inspects the 
value returned by the spawned computation whose result is accessed by 
the handle e. Creating, reading, and writing labeled MVars are respec- 
tively captured by expressions newLMVar, takeLMVar, and putLMVar. 

We consider standard types for Booleans (Bool), unit ( 0 ), and func- 
tion (t -»■ t) values. Type I describes security labels. Type Result I r 
denotes handles used to access labeled results produced by spawned 
computations, where the results are of type r and labeled with labels of 
type I. Type LMVar I r describes labeled MVars, with labels of type I and 
storing values of type r. Type LIO I r represents monadic LIO computa- 
tions, with a result type r and the security labels of type I. 

As in [40], we consider that each thread has a thread-local runtime 
environment cr, which contains the current label cr.lbl and the current 
clearance cr.clr. The global environment E, common to every thread, 
holds the global memory store 4>, which is a mapping from LMVar names 
to Lb nodes. 

The relation {£, (a,e)) {£', (a',e')) represents a single execution 
step from thread e, under the run-time environments S and cr, to thread 
e' and run-time environments S' and a'. (This relation does not account 
for the effects of the cache.) We say that e reduces to e' in one step. Sym- 
bol 7 ranges over the internal events triggered by threads. We utilize 
internal events to communicate between the threads and the scheduler, 
e.g., when spawning new threads. 

We show the most relevant rules for -L in Figure 8. Rule (Lab) gen- 
erates a labeled value if and only if the label is between the current label 
and clearance of the LIQ computation. Rule (UNLAB) requires that, when 
the content of a labeled value is "retrieved" and used in a LIO computa- 
tion, the current label is raised (cr' = er[lbl h> I'], where /' = cr.lbl u I), 
thus capturing the fact that the remaining computation might depend 
on e. Rule (LFORK) allows for the creation of a thread and generates 
the internal event f ork(e'), where e' is the computation to spawn. The 
rule allocates a new LMVar in order to store the result produced by the 
spawned thread (e s= Aa; .putLMVar m x). Using that LMVar, the rule 
provides a handle to access to the thread's result (return (R m)). Rule 
(lWait) simply uses the LMVar for the handle. Rule (nLMVar) describes 
the creation of a new LMVar with a label bounded by the current label and 
clearance (cr.lbl £ I != cr.clr). Rule (tLMVar) raises the current label 
(cr' = cr[lbl i-> cr.lbl u I]) when emptying (S.(f>[m «■ Lb I □]) its content 
(E.(f>(m) = Lb I e). Similarly, considering the security level I of a LMVar, 
rule (pLMVar) raises the current label (cr' = cr[lbl «■ cr.lbl u 1]) when 
filling (£.<j)[m i-> Lb I e]) its content (S.(p(m) = Lb I a). Note that both 
takeLMVar and putLMVar observe if the LMVar is empty in order to proceed 
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(Lab) 

o-.lbl E I E cr.clr 



(X, (o-, £[label Z e])) — ► (X, (a, £[return (Lb Z e)]» 
(UNLAB) 

Z' = cr.lbluZ Z'e cr.clr ff' = cr[lbl h+ Z'] 
(X, (a, ^[unlabel (Lb Z e)])) — ► (X, (cr', ^[return e])) 

(lFork) 

a.lbl E I E cr.clr 

X 1 ' = X[0 !-<■ E.(f>[m !-<■ Lb Z 0]] e' = e 3= Az.putLMVar ma; m fresh 
(X, (a, £[LFork Z e]» <X', (cr, ^[return (R m)]» 

(lWait) 

(X,(cj, £[lWait (Bm)])| — ► (X, (cr, £[takeLMVar m])) 
(nLMVar) 

cr.lbl E Z E cr.clr X' = S[<f> i-> E.(f>[m h+ Lb Z e]] m /resZi 
(X, (cr, £[newLMVar Z e])) — ► (X', (cr, ^[return m])) 

(TLMVAR) 

E.<p(m) = Lb I e e + E cr.lbl E Z E cr.clr 
cr' = cr[lbl h+ cr.lbl U I] X' = X[0 h+ E.(f>[m h+ Lb Z □]] 

(X, (cr, SftakeLMVar m])) — ► (X', (a', ^[return e])) 

(PLMVAR) 

E.<f>{m) = Lb Z □ a.lbl E Z E o-.clr 
a' = a[lbl h+ cr.lbl u Z] X' = X[> h+ X.<£[m h+ Lb Z e]] 

(X,(cr,£[putLMVar m e])) — ► ( X', (cr', £[return ()])) 



Fig. 8. Semantics for expressions. 

to modify its content. Precisely, takeLMVar and putLMVar perform a read 
and a write of the mutable location. Operations on LMVar are bi-directional 
and consequently the rules (tLMVar), and (pLMVar) require not only 
that the label of the mentioned LMVar be between the current label and 
current clearance of the thread (cr.lbl != I != cr.clr), but that the current 
label be raised appropriately. 



1.1 Cache-aware semantics using instruction-based scheduling 

Figure 9 presents cache-aware reduction rules for concurrent execution 
using instruction-based scheduling. The configurations for this relation 
are very similar to the ones for time-based scheduling in Figure 6 except 
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that we use an instruction budget b rather than a time quantum q. We 
write bi for the initial budget for threads. 

The main difference between these semantics and the time-based 
ones is the cache-aware transition rule (STEP-CA). In this rule, the num- 
ber of cycles k that the current instruction takes is ignored by the sched- 
uler, counting as one instruction regardless of the time its execution 
took. 



(Step-CA) 

(Z,(a,e)) c ^ k (Z',(o',e')) c g>0 

(Z, £ b, (a, e) < t.) - (Z', C', b + 1, (a, e) < t.) 

(Preempt-CA) (No-Step-CA) 

g<0 (E,t)c-f+ t = (a,e) e + v 

(^HMH^ZM^ {Z,(,b,t<t s )^{Z,(,b t ,t 3 >t\ 

(FORK-CA) 

f ork( e) 

{Z,t} c — ► k (Z',(a,e')) c tnew = <cr,e) g>0 
(Z, (,b,t<3t s )^ (If, £,6+1, (a, e) <3t s o t new ) 

(EXIT-CA) 

(Z,t) c ^ k (Z',(a,v)) c b>0 
(E,W<t s )^(E'X,bi,t s ) 



Fig. 9. Semantics for threadpools under round-robin instruction-based schedul- 
ing 



1.2 Security guarantees 

In this section, we show that LIO computations satisfy termination-sensi- 
tive non-interference. As in [23, 34, 39], we prove this property by using 
the term erasure technique. The erasure function e L rewrites data at se- 
curity levels that the attacker cannot observe into the syntax node •. 

The function e L is defined in such a way that e L (e) contains no in- 
formation above level L, i.e., the function e L replaces all the informa- 
tion more sensitive than L in e with a hole (•). In most of the cases, the 
erasure function is simply applied homomorphically (e.g., £i,(ei e 2 ) = 
£i(ei) £z,(e2)). For run expressions, the erasure function is mapped into 
all threads; all threads with a current label above L are removed from the 
pool (filter (\{a, e).e ^ •) (map ^ t s ), where = denotes syntactic equiva- 
lence). The computation performed in a certain sequential configuration 
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is erased if the current label is above L. For runtime environments and 
stores, we map the erasure function into their components. Similarly, a 
labeled value is erased if the label assigned to it is above L. 

Following the definition of the erasure function, we introduce a new 
evaluation relation — > L as follows: 

{E,(*,t})^ {£',(*', t}) 
(£,{a,t)}^ L e L ({S',(a',t'))) 

The relation — ► L guarantees that confidential data, i.e., data not below 
level L, is erased as soon as it is created. We write — > L for the reflexive 
and transitive closure of — > L . 

In order to prove non-interference, we will establish a simulation re- 
lation between — >* and — >* L through the erasure function: erasing all 
secret data and then taking evaluation steps in — ► L is equivalent to tak- 
ing steps in — ► first, and then erasing all secret values in the resulting 
configuration. Note that this relation would not hold if information from 
some level above L was being leaked by the program. In the rest of this 
section, we only consider well-typed terms to ensure there are no stuck 
configurations. 

We start by showing that the evaluation relation — ► L is deterministic 

Proposition 1 (Determinacy of — >l). If(S,t)^ — >l {£',t')c and 
(S,t) c >l {Z",t") c „, then (S',t') c = (S",t"} c , 

Proof. By induction on expressions and evaluation contexts, showing 
there is always a unique redex in every step. 

The next lemma establishes a simulation between ^* and ^* L . 

Lemma 1 (Many-step simulation). If {E,(,b, t s ) ^* (S',Cb',t' s ), then 
s L ((E,CAt s ))^ L e L ({i;',e,b',t' s )). 

Proof. In order to prove this result, we rely on properties of the erasure 
function, such as the fact that it is idempotent and homomorphic to the 
application of evaluation contexts and substitution. We show that the 
result holds by case analysis on the rule used to derive {S,t s ) ^* {S',t' s }, 
and considering different cases for threads whose current label is below 
(or not) level L. 

The L-equivalence relation * L is an equivalence relation between 
configurations (and their parts), defined as the equivalence kernel of 
the erasure function e L : [E,£,b,t B ) * L (S',C,b',r s ) ffie L ({E,C,b,ta)) = 
£i(j£',(',6',rs))- If two configurations are i-equivalent, they agree on 
all data below or at level L, i.e., they cannot be distinguished by an at- 
tacker at level L. Note that two queues are L-equivalent iff the threads 
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with current label no higher than L are pairwise ^-equivalent in the or- 
der that they appear in the queue. 

The next theorem shows the non-interference property. It essentially 
states that if we take two executions of a program with two L-equivalent 
inputs, then for every intermediate step of the computation of the first 
run, there is a corresponding step in the computation of the second run 
which results in an L-equivalent configuration. 

Theorem 2 (Termination-sensitive non-interference). Given a computa- 
tion e (with no Lb, () LI °, E, R, and •) where r h- e : Labeled I r -»■ 
LIO t (Labeled I r'), an attacker at level L, an initial securiy context a, run- 
time environments S\ and S 2 where Ei.<j> = E 2 .<j> = 0, and initial cache 
states (i and ( 2 , then 

Veie2.(-T i— ei : Labeledl r)i=i <2 A e\ «l e 2 

a (^Ci^b^i^e e,)} ^' (SlCi^tl) 
^3E' 2 C 2 b' 2 tUS2,C2,bi,{^ee 2 )}^ (Z 2 , ( 2 ,b' 2 ,t 2 s ) 

Proof. Take {E\,C,i,bi,(a, e ei)) ^* ([, b[,tl) and apply Lemma 1 to 
get e L ({ S 1 , Ci , h , {a, e ei ) ) ) ^ L e L ({ E[ , ([ , b[ , t] } ) . We know this reduc- 
tion only includes public L) steps, so the number of steps is lower 
than or equal to the number of steps in the first reduction. 

We can always find a reduction starting from e L ( { S 2 , ( 2 , bi , {a, e e 2 } ) ) 
with the same number of steps as £l{{£\, (i, bi, (cr, e ei))) <-** L 
s L ({E[,([,b' 1 ,tl)), so by the Determinacy Lemma we have 
s L ((S2,C2,bi,(a,e e 2 ))) e L {(S' 2 X' 2 ,b' 2 ,t 2 s )). By Lemma 1 again, we 

get<r 2 ,C2,M<7,ee 2 )) ^ i S 2 > C 2 ,b' 2 , t 2 s ) and therefore {S'^CiA, * I ) *l 
{S' 2 ,C 2 ,b' 2 ,t 2 ). 
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Abstract. Information-flow control (IFC) is a security mechanism 
conceived to allow untrusted code to manipulate sensitive data 
without compromising confidentiality. Unfortunately, untrusted 
code might exploit some covert channels in order to reveal infor- 
mation. In this paper, we focus on the LIO concurrent IFC system. 
By leveraging the effects of hardware caches (e.g., the CPU cache), 
LIO is susceptible to attacks that leak information through the 
internal timing covert channel. We present a resumption-based ap- 
proach to address such attacks. Resumptions provide fine-grained 
control over the interleaving of thread computations at the library 
level. Specifically, we remove cache-based attacks by enforcing 
that every thread yield after executing an "instruction," i.e., atomic 
action. Importantly, our library allows for porting the full LIO 
library — our resumption approach handles local state and excep- 
tions, both features present in LIO. Lo amend for performance 
degradations due to the library-level thread scheduling, we pro- 
vide two novel primitives. First, we supply a primitive for securely 
executing pure code in parallel. Second, we provide developers 
a primitive for controlling the granularity of "instructions"; this 
allows developers to adjust the frequency of context switching to 
suit application demands. 
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1 Introduction 

Popular website platforms, such as Facebook, run third-party applica- 
tions (apps) to enhance the user experience. Unfortunately, in most of 
today's platforms, once an app is installed it is usually granted full or 
partial access to the user's sensitive data — the users have no guarantees 
that their data is not arbitrarily ex-filtrated once apps are granted access 
to it [18]. As demonstrated by Hails [9], information-flow control (IFC) 
addresses many of these limitations by restricting how sensitive data is 
disseminated. While promising, IFC systems are not impervious to at- 
tacks; the presence of covert channels allows attackers to leak sensitive 
information. 

Covert channels are mediums not intended for communication, which 
nevertheless can be used to carry and, thus, reveal information [19]. In 
this work, we focus on the internal timing covert channel [33]. This channel 
emanates from the mere presence of concurrency and shared resources. 
A system is said to have an internal timing covert channel when an at- 
tacker, as to reveal sensitive data, can alter the order of public events by 
affecting the timing behavior of threads. To avoid such attacks, several 
authors propose decoupling computations manipulating sensitive data 
from those writing into public resources (e.g., [4, 5, 27, 30, 35]). 

Decoupling computations by security levels only works when all 
shared resources are modeled. Similar to most IFC systems, the concur- 
rent IFC system LIO [35] only models shared resources at the program- 
ming language level and does not explicitly consider the effects of hard- 
ware. As shown in [37], LIO threads can exploit the underlying CPU 
cache to leak information through the internal timing covert channel. 

We propose using resumptions to model interleaved computations. 
(We refer the interested reader to [10] for an excellent survey of resump- 
tions.) A resumption is either a (computed) value or an atomic action 
which, when executed, returns a new resumption. By expressing thread 
computations as a series of resumptions, we can leverage resumptions 
for controlling concurrency. Specifically, we can interleave atomic ac- 
tions, or "instructions," from different threads, effectively forcing each 
thread to yield at deterministic points. This ensures that scheduling is 
not influenced by underlying caches and thus cannot be used to leak 
secret data. We address the attacks on the recent version of LIO [35] by 
implementing a Haskell library which ports the LIO API to use resump- 
tions. Since LIO threads possess local state and handle exceptions, we 
extend resumptions to account for these features. 

In principle, it is possible to force deterministic interleaving by means 
other than resumptions; in [37] we show an instruction-based scheduler 
that achieves this goal. However, Haskell's monad abstraction allows us 
to to easily model resumptions as a library. This has two consequences. 
First, and different from [37], it allows us to deploy a version of LIO 
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that does not rely on changes to the Haskell compiler. Importantly LIO's 
concurrency primitives can be modularly redefined, with little effort, to 
operate on resumptions. Second, by effectively implementing "instruc- 
tion based-scheduling" at the level of library primitives, we can address 
cache attacks not covered by the approach described in [37] (see Sec- 
tion 5). 

In practice, a library-level interleaved model of computations im- 
poses performance penalties. With this in mind, we provide primitives 
that allow developers to execute code in parallel, and means for securely 
controlling the granularity of atomic actions (which directly affects per- 
formance). 

Although our approach addresses internal timing attacks in the pres- 
ence of shared hardware, the library suffers from leaks that exploit the 
termination channel, i.e., programs can leak information by not termi- 
nating. However, this channel can only be exploited by brute-force at- 
tacks that leak data external to the program — an attacker cannot leak 
data within the program, as can be done with the internal timing covert 
channel. 



2 Cache Attacks on Concurrent IFC Systems 



Figure 1 shows an attack that 
leverages the timing effects of the 
underlying cache in order to leak 
information through the internal 
timing covert channel. In isola- 
tion, all three threads are secure. 
However, when executed concur- 
rently, threads B and C race to 
write to a public, shared variable 
1. Importantly, the race outcome 
depends on the state of the secret 
variable h, by changing the con- 
tents of underlying CPU cache ac- 
cording to its value (e.g., by creat- 
ing and traversing a large array as 
to fill the cache with new data). 

The attack proceeds as fol- 
lows. First, thread A fills the 
cache with the contents of a pub- 
lic array lowArray. Then, de- 
pending on the secret variable 
h, it evicts data from the cache 
(by filling it with arbitrary data) 



fi I ICache{ lowArray) 



Thread A 





| <h == 0 


fillCache(highArray) 



skip 




Fig. 1. Cache attack 
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or leaves it intact. Concurrently, 

public threads B and C delay execution long enough for A to finish. 
Subsequently thread B accesses elements of the public array lowArray, 
and writes 0 to public variable 1; if the array has been evicted from the 
cache (h==0), the amount of time it takes to perform the read, and thus 
the write to 1, will be much longer than if the array is still in the cache. 
Hence, to leak the value of h, thread C simply needs to delay writing 1 to 
1 long enough so that it is above the case where the cache is full (with the 
public array), but shorter than it take to refill the cache with the (public) 
array. Observing the contents of 1, the attacker directly learns the value 
of h. 

This simple attack has previously been demonstrated in [37], where 
confidential data from the GitStar system [9], build atop LIO, was leaked. 
Such attacks are not limited to LIO or IFC systems; cache-based attacks 
against many system, including cryptographic primitives (e.g., RSA and 
AES), are well known [1, 23, 26, 40]. 

The next section details the use of resumptions in modeling con- 
currency at the programming language level by defining atomic steps, 
which are used as the thread scheduling quantum unit. By scheduling 
threads according to the number of executed atoms, the attack in Fig- 
ure 1 is eliminated. As in [37], this is the case because an atomic step 
runs till completion, regardless of the state of the cache. Hence, the tim- 
ing behavior of thread B, which was previously leaked to thread C by 
the time of preemption, is no longer disclosed. Specifically, the schedul- 
ing of thread C's 1 : =1 does not depend on the time it takes thread B 
to read the public array from the cache; rather it depends on the atomic 
actions, which do not depend on the cache state. In addition, our use of 
resumptions also eliminates attacks that exploit other timing perturba- 
tions produced by the underlying hardware, e.g., TLB misses, CPU bus 
contention, etc. 

3 Modeling Concurrency with Resumptions 

In pure functional languages, computations with side-effects are enco- 
ded as values of abstract data types called monads [22]. We use the type 
m a to denote computations that produce results of type a and may per- 
form side-effects in monad m. Different side-effects are often handled 
by different monads. In Haskell, there are monads for performing in- 
puts and outputs (monad 10), handling errors (monad Error), etc. The 
IFC system LIO simply exposes a monad, LIO, in which security checks 
are performed before any IO side-effecting action. 

Resumptions are a simple approach to modeling interleaved com- 
putations of concurrent programs. A resumption, which has the form 
res ::— x\a >res, is either a computed value x or an atomic action a fol- 
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data Thread m a where 
Done :: a — > Thread m a 
Atom :: m {Thread to a) 



—¥ Thread to a 



sch :: [Thread m ()] — > m () 
sch [] = return () 

sch ((Done _) : thrds) = sch f/mZs 
sch ((Atom m) : thrds) = 



-For/c :: Thread m () 



do res <S— to; sch (thrds 4f [res]) 
sch ((Fork res res') : thrds) = 



— > Thread rn a 
—¥ Thread to a 



sch ((res : thrds) -H- [res']) 



Fig. 2. Threads as Resumptions Fig. 3. Simple round-robin scheduler 

lowed by a new resumption res. Using this notion, we can break down a 
program that is composed of a series of instructions into a program that 
executes an atomic action and yields control to a scheduler by giving it 
its subsequent resumption. For example, program P := ii; i 2 ; which 
performs three side-effecting instructions in sequence, can be written 
as resp := h;i 2 > i 3 > (), where () is a value of a type with just one 
element, known as unit. Here, an atomic action a is any sequence of 
instructions. When executing res P , instructions i\ and i 2 execute atomi- 
cally, after which it yields control back to the scheduler by supplying it 
the resumption res' p := > (). At this point, the scheduler may schedule 
atomic actions from other threads or execute res' P to resume the execu- 
tion of P. Suppose program Q := ji,j2, rewritten as ji > j 2 > (), runs 
concurrently with P. Our concurrent execution of P and Q can be mod- 
eled with resumptions, under a round-robin scheduler, by writing it as 
P\\Q ■— h; 12 > ji > «3 > 32 > () > ()• m other words, resumptions allow us 
to implement a scheduler that executes ii;%2, postponing the execution 
of 13, and executing atomic actions from Q in the interim. 

Implementing threads as resumptions As previously done in [10, 11], Fig. 2 
defines threads as resumptions at the programming language level. The 
thread type (Thread m a) is parametric in the resumption computa- 
tion value type (a) and the monad in which atomic actions execute (m) . 
(Symbol :: introduces type declarations and -> denotes function types.) 
The definition has several value constructors for a thread. Constructor 
Done captures computed values; a value Done a represents the com- 
puted value a. Constructor Atom captures a resumption of the form a > 
res. Specifically, Atom takes a monadic action of type m (Thread m a), 
which denotes an atomic computation in monad m that returns a new 
resumption as a result. In other words, Atom captures both the atomic 
action that is being executed (a) and the subsequent resumption (res). 
Finally, constructor Fork captures the action of spawning new threads; 
value Fork res res' encodes a computation wherein a new thread runs 

1 In our implementation, atomic actions a (as referred as in a > res) are actions 
described by the monad to. 
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resumption res and the original thread continues as res'. 2 As in the stan- 
dard Haskell libraries, we assume that a fork does not return the new 
thread's final value and thus the type of the new thread/ resumption is 
simply Thread m (). 

Programming with resumptions Users do not build programs based on re- 
sumptions by directly using the constructors of Thread m a. Instead, 
they use the interface provided by Haskell monads: 
return :: a — > Thread m a and (>=) :: Thread m a — > (a — > Thread m b) 
— > Thread m b. The expression return a creates a resumption which 
consists of the computed value a, i.e., it corresponds to Done a. The 
operator (>=), called Zrind, is used to sequence atomic computations. 
Specifically, the expression res >=/ returns a resumption that consists of 
the execution of the atomic actions in res followed by the atomic actions 
obtained from applying / to the result produced by res. We sometimes 
use Haskell's do-notation to write such monadic computations. For ex- 
ample, the expression res »= (A a -> return (a+1)), i.e., actions described 
by the resumption res followed by return (a + 1) where a is the result 
produced by res, is written as do a <- res; return (a + 1). 

Scheduling computations We use round-robin to schedule atomic actions 
of different threads. Fig. 3 shows our scheduler implemented as a func- 
tion from a list of threads into an interleaved computation in the monad 
m. The scheduler behaves as follows. If there is an empty list of resump- 
tions, the scheduler, and thus the program, terminates. If the resump- 
tion at the head of the list is a computed value (Done _), the scheduler 
removes it and continues scheduling the remaining threads (sch thrds). 
(Recall that we are primarily concerned with the side-effects produced 
by threads and not about their final values.) When the head of the list is 
an atomic step (Atom m), sch runs it (res <- m), takes the resulting re- 
sumption (res), and appends it to the end of the thread list (sch (thrds +- 
[res])). Finally, when a thread is forked, i.e., the head of the list is a 
Fork res res' , the spawned resumption is placed at the front of the list 
(res : thrds). Observe that in both of the latter cases the scheduler is in- 
voked recursively — hence we keep evaluating the program until there 
are no more threads to schedule. We note that although we choose a 
particular, simple scheduling approach, our results naturally extend for 
a wide class of deterministic schedulers [28, 38]. 

4 Extending Resumptions with State and Exceptions 

LIO provides general programming language abstrations (e.g., state and 
exceptions), which our library must preserve to retain expressiveness. To 

2 Spawning threads could also be represented by a equivalent constructor 
Fork' -.-.Thread m () — ¥ Thread m a, we choose Fork for pedagogical reasons. 
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this end, we extend the notion of resumptions and modify the scheduler 
to handle thread local state and exceptions. 

Thread local state As de- 
scribed in [34], the LIO 
monad keeps track of a 
current label, L CUT . This 
label is an upper bound 
on the labels of all data 
in lexical scope. When 
a computation C, with 
current label L c , ob- 
serves an object labeled 
L 0 , C's label is raised to the least upper bound or join of the two la- 
bels, written i c U L 0 . Importantly, the current label governs where the 
current computation can write, what labels may be used when creating 
new channels or threads, etc. For example, after reading an object O, the 
computation should not be able to write to a channel K if L 0 is more 
confidential than Lk — this would potentially leak sensitive information 
(about O) into a less sensitive channel. We write L c E L K when L K at 
least as confidential as Lc and information is allowed to flow from the 
computation to the channel. 

Using our resumption definition of Section 3, we can model concur- 
rent LIO programs as values of type Thread LIO. Unfortunately, such 
programs are overly restrictive — since LIO threads would be sharing a 
single current label — and do not allow for the implementation of many 
important applications. Instead, and as done in the concurrent version 
of LIO [35], we track the state of each thread, independently, by modi- 
fying resumptions, and the scheduler, with the ability to context-switch 
threads with state. 

Figure 4 shows these changes to sch. The context-switching mech- 
anism relies on the fact that monad m is a state monad, i.e., provides 
operations to retrieve (get) and set (put) its state. LIO is a state monad, 3 
where the state contains (among other things) L cur . Operation (>-) :: 
m b -> Thread m a -> Thread m a modifies a resumption in such a 
way that its first atomic step (Atom) is extended with m b as the first ac- 
tion. Here, Atom consists of executing the atomic step (res <— m), taking 
a snapshot of the state (st <- get), and restoring it when executing the 
thread again (put st y res). Similarly, the case for Fork saves the state 
before creating the child thread and restores it when the parent thread 
executes again (put st y res'). 

3 For simplicity of exposition, we use get and set. However, LIO only provides 
such functions to trusted code. In fact, the monad LIO is not an instance of 
MonadState since this would allow untrusted code to arbitrarily modify the 
current label — a clear security violation. 



sch ((Atom m) : thrds) = 
do res m 
st <— get 

sch (thrds -R- [put st y res]) 

sch ((Fork res res') : thrds) = 
do st get 

sch ((res : thrds) 4f [put st y res']) 

Fig. 4. Context-switch of local state 
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Exception handling As described in [36], LIO provides a secure way to 
throw and catch exceptions — a feature crucial to many real-world ap- 
plications. Unfortunately simply using LIO's throw and catch as atomic 
actions, as in the case of local state, results in non-standard behavior. In 
particular, in the interleaved computation produced by sch, an atomic 
action from a thread may throw an exception that would propagate 
outside the thread group and crash the program. Since we do not con- 
sider leaks due to termination, this does not impact security; however, 
it would have non-standard and restricted semantics. Hence, we first 
extend our scheduler to introduce a top-level catch for every spawned 
thread. 

Besides such an extension, our approach still remains quite limiting. 
Specifically, LIO's catch is defined at the level of the monad LIO, i.e., 
it can only be used inside atomic steps. Therefore, catch-blocks are pre- 
vented from being extended beyond atomic actions. To address this lim- 
itation, we lift exception handling to work at the level of resumptions. 

To this end, we thmw £ = Mom {LIOAhrow e) 
consider a monad m 

that handles exceptions, catch \ Done a \~ = Done a 

i.e., a monad for which catch ( Mom a ) handler = 

throw :: c -+ m a and Atom (LIO. catch 

catch ::m a -+ (e -+ ( do res *~ a 

m a) -+ m a, where , return ^ at t ch res handl ^)) 

c is a type denoting ^ e return [ handler e ^ 

exceptions, are accord- catch ( Fork res ms ) handler = 

ingly defined. Func- Fork res ( catch res handler ) 

tion throw throws the 

, . , Fie. 5. Exception handling for resumptions 

exception supplied as ° r or 

an argument. Function catch runs the action supplied as the first argu- 
ment (m a), and if an exception is thrown, then executes the handler 
(e -> m a) with the value of the exception passed as an argument. If no 
exceptions are raised, the result of the computation (of type a) is simply 
returned. 

Figure 5 shows the definition of exception handling for resumptions. 
Since LIO defines throw and catch [36], we qualify these underlying 
functions with LIO to distinguish them from our resumption-level throw 
and catch. When throwing an exception, the resumption simply executes 
an atomic step that throws the exception in LIO (LIO. throw e). 

The definitions of catch for Done and Fork are self explanatory. The 
most interesting case for catch is when the resumption is an Atom. Here, 
catch applies LIO. catch step by step to each atomic action in the se- 
quence; this is necessary because exceptions can only be caught in the 
LIO monad. As shown in Fig. 5, if no exception is thrown, we simply 
return the resumption produced by m. Conversely, if an exception is 
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raised, LIO. catch will trigger the exception handler which will return a 
resumption by applying the top-level handler to the exception e. To clar- 
ify consider catching an exception in the resumption ot\ > a 2 > x. Here, 
catch executes ai as the first atomic step, and if no exception is raised, 
it executes a 2 as the next atomic step; on the other hand, if an exception 
is raised, the resumption a 2 > x is discarded and catch, instead, executes 
the resumption produced when applying the exception handler to the 
exception. 

5 Performance Tuning 

Unsurprisingly interleaving computations at the library-level introduces 
performance degradation. To alleviate this, we provide primitives that 
allow developers to control the granularity of atomic steps — fine-grained 
atoms allow for more flexible programs, but also lead to more context 
switches and thus performance degradation (as we spend more time 
context switching). Additionally, we provide a primitive for the parallel 
execution of pure code. We describe these features — which do not affect 
our security guarantees — below. 

Granularity of atomic steps To decrease the frequency of context switches, 
programmers can treat a complex set of atoms (which are composed us- 
ing monadic bind) as a single atom using singleAtom :: Thread m a — > 
Thread m a. This function takes a resumption and "compresses" all 
its atomic steps into one. Although singleAtom may seem unsafe, e.g., 
because we do not restrict threads from adjust the granularity of atomic 
steps according to secrets, in Section 6 we show that this is not the case — 
it is the atomic execution of atoms, regardless of their granularity, that 
ensures security. 

Parallelism As in [37], we cannot run one scheduler sch per core to gain 
performance through parallelism. Threads running in parallel can still 
race to public resources, and thus vulnerable to internal timing attacks 
(that may, for example, rely on the L3 CPU cache). In principle, it is 
possible to securely parallelize arbitrary side-effecting computations if 
races (or their outcomes) to shared public resource are eliminated. Sim- 
ilar to observational low-determinism [41], our library could allow parallel 
computations to compute on disjoint portions of the memory. However, 
whenever side-effecting computations follow parallel code, we would 
need to impose synchronization barriers to enforce that all side-effects 
are performed in a pre-determined order. It is precisely this order, and 
LIO's safe side-effecting primitives for shared-resources, that hides the 
outcome of any potential dangerous parallel race. In this paper, we fo- 
cus on executing pure code in parallel; we leave side-effecting code to 
future work. 
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Pure computations, by definition, cannot introduce races to shared 
resources since they do not produce side effects. 4 To consider such com- 
putations, we simply extend the definition of Thread with a new con- 
structor: Parallel :: pure b -> (6 -> Thread m a) -> Thread m a. Here, 
pure is a monad that characterizes pure expressions, providing the prim- 
itive runPure ::pure b — > 6 to obtain the value denoted by the code given 
as argument. The monad pure could be instantiated to Par, a monad 
that parallelizes pure computations in Haskell [21], with runPure set to 
runPar. In a resumption, Parallel p f specifies that p is to be executed in 
a separate Haskell thread — potentially running on a different core than 
the interleaved computation. Once p produces a value x, f is applied to 
x to produce the next resumption to execute. 

sch (Parallel p f : thrds) — 

do res 4— sync (\v — > putMVar v (runPure p)) 
(Xv — > takeMVar v) 
f 

sch (thrds -ff [res]) 

Fig. 6. Scheduler for parallel computations 

Figure 6 defines sch for pure computations, where interaction be- 
tween resumptions and Haskell-threads gets regulated. The scheduler 
relies on well-established synchronization primitives called MVars [13]. 
A value of type MVar is a mutable location that is either empty or con- 
tains a value. Function putMVar fills the MVar with a value if it is empty 
and blocks otherwise. Dually, takeMVar empties an MVar if it is full 
and returns the value; otherwise it blocks. Our scheduler implementa- 
tion sch simply takes the resumption produced by the sync function 
and schedules it at the end of the thread pool. Function sync, internally 
creates a fresh MVar v and spawns a new Haskell-thread to execute 
putMVar v (runPure p). This action will store the result of the paral- 
lel computation in the provided MVar. Subsequently, sync returns the 
resumption res, whose first atomic action is to read the parallel com- 
putation's result from the MVar (takeMVar v). At the time of reading, 
if a value is not yet ready, the atomic action will block the whole inter- 
leaved computation. However, once a value x is produced (in the sepa- 
rate thread), / is applied to it and the execution proceeds with the pro- 
duced resumption (/ x). 



4 In the case of Haskell, lazy evaluation may pose a challenge since whether or 
not a thunk has been evaluated is indeed an effect on a cache [24]. Though our 
resumption-based approach handles this for the single-core case, handling 
this in general is part of our ongoing work. 
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(Done) 

(E, sch (Done x : t s )) — > (E, sch t s ) 

(Atom) 

{Z,m) (^,(e) 1IQ ) 

(E,sch (Atom (put E.lbl > m) : t s )) — > (E',sch (t s -H- [put X\lbl >- e])) 

(Fork) 

(E, sch (Fork mi m 2 : i s )) — > (E, sch ((mi : t s ) 4f [put XUbl >- ma])) 
Fig. 7. Semantics for sch expressions. 

6 Soundness 

In this section, we extend the previous formalization of LIO [34] to model 
the semantics of our concurrency library. We present the syntax exten- 
sions that we require to model the behavior of the Thread monad: 

Expression: e ::= ... | sch e s | Atome \ Donee \ Forkee \ Parallel ee 

where e s is a list of expressions. For brevity we omit a full presentation 
of the syntax and semantics, since we rely on previous results in order 
to prove the security property of our approach. The interested reader is 
referred to [6]. 

Expressions are the usual A-calculus expressions with special syntax 
for monadic effects and LIO operations. The syntax node sch e s denotes 
the scheduler running with the list of threads e s as its thread pool. The 
nodes Atom e, Done e, Fork e e and Parallel e e correspond to the con- 
structors of the Thread data type. In what follows, we will use metavari- 
ables x, to, p, t, v and / for different kinds of expressions, namely values, 
monadic computations, pure computations, threads, MVars and func- 
tions, respectively. 

We consider a global environment £ which contains the current la- 
bel of the computation (E. lb 1), and also represents the resources shared 
among all threads, such as mutable references. We start from the one- 
step reduction relation 5 (S, e) — > (£', e'), which has already been de- 
fined for LIO [34]. This relation represents a single evaluation step from 
e to e', with S as the initial environment and £' as the final one. Pre- 
sented as an extension to the — > relation, Figure 7 shows the reduction 
rules for concurrent execution using sch. The configurations for this re- 
lation are of the form (£, sch t s ), where E is a runtime environment 
and t s is a list of Thread computations. Note that the computation in 

5 As in [35], we consider a version of — ► which does not include the operation 
toLabeled, since it is susceptible to internal timing attacks. 
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(Seq) 

(Z,e) e'} P^P' 

(Z,e\\P)^(Z',e'\\P') 

(Pure) 

P^ P' v 3 fresh MVar s = E.lbl 

(E,sch (Parallel p f : t s ) \\ P) ^ 
(i7,sch (t s -H- [Atom (takeMVar v s »=/)]) || P' \\ (putMVar v s (runPure p)) s ) 

(Sync) 

P^ P' 

{E,sch (Atom (takeMVar 4) || (putMVar v s x) s \\ P) ^ 

(S,sch(f x:t s ) ||P') 

Fig. 8. Semantics for sch expressions with parallel processes. 



an Atom always begins with either put S.lbl for some label S.lhl, or 
with takeMVar v for some MVar v. Rules (DONE), (Atom), and (FORK) 
basically behave like the corresponding equations in the definition of 
sch (see Figures 3 and 4). In rule (Atom), the syntax node (e) LI ° rep- 
resents an LIO computation that has produced expression e as its re- 
sult. Although sch applications should expand to their definitions, for 
brevity we show the unfolding of the resulting expressions into the next 
recursive call. This unfolding follows from repeated application of basic 
A-calculus reductions. 

Figure 8 extends relation — > into ^ to express pure parallel compu- 
tations. The configurations for this relation are of the form (£, sch t s \\ P), 
where P is an abstract process representing a pure computation that is 
performed in parallel. These abstract processes would be reified as na- 
tive Haskell threads. The operator (||), representing parallel process com- 
position, is commutative and associative. 

As described in the previous section, when a Thread evaluates a 
Parallel computation, a new native Haskell thread should be spawned 
in order to run it. Rule (PURE) captures this intuition. A fresh MVar v s 
(where s is the current label) is used for synchronization between the 
parent and the spawned thread. A process is denoted by putMVar v s 
followed by a pure expression, and it is also tagged with the security 
level of the thread that spawned it. 

Pure processes are evaluated in parallel with the main threads man- 
aged by sch. The relation =>■ nondeterministically evaluates one process 
in a parallel composition and is defined as follows. 

runPure p — > x 



(putMVar v s (runPure p)) s || P =>■ (putMVar v s x) s || P 
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For simplicity, we consider the full evaluation of one process until it 
yields a value as just one step, since the computations involved are pure 
and therefore cannot leak data. Rule (Seq) in Figure 8 represents steps 
where no parallel forking or synchronization is performed, so it executes 
one — > step alongside a =>■ step. 

Rule (SYNC) models the synchronization barrier technique from Sec- 
tion 5. When an Atom of the form (takeMVar v s »= /) is evaluated, ex- 
ecution blocks until the pure process with the corresponding MVar v s 
completes its computation. After that, the process is removed and the 
scheduler resumes execution. 

Security guarantees We show that programs written using our library 
satisfy termination-insensitive non-interference, i.e., an attacker at level 
L cannot distinguish the results of programs that run with indistinguish- 
able inputs . This result has been previously established for the sequen- 
tial version of LIO [34]. As in [20, 31, 34], we prove this property by 
using the term erasure technique. 

In this proof technique, we define function e L in such a way that 
£z,(e) contains only information below or equal to level L, i.e., the func- 
tion e l replaces all the information more sensitive than L or incompa- 
rable to L in e with a hole (•). We adapt the previous definition of e L 
to handle the new constructs in the library In most of the cases, the 
erasure function is simply applied homomorphically (e.g., £z,(ei e-i) — 
£c(ei) £i,(e2)). For sch expressions, the erasure function is mapped into 
the list; all threads with a current label above L are removed from the 
pool {filter •) (map sl t s )), where = denotes syntactic equivalence). 
Analogously, erasure for a parallel composition consists of removing all 
processes using an MVar tagged with a level not strictly below or equal 
to L. The computation performed in a certain Atom is erased if the label 
is not strictly below or equal than L. This is given by 

£h{Atom (put s ^> m)) | ^ ^ ^ ^ otherwise 

A similar rule exists for expressions of the form Atom (takeMVar v s 
»=/). Note that this relies on the fact that an atom must be of the form 
Atom (put s >• to) or Atom (takeMVar v s ;>*=/) by construction. For 
expressions of the form Parallel p /, erasure behaves homomorphically, 
i.e. eh(Parallel p /) = 
Parallel £l(p) (el ° !)■ 

Following the definition of the erasure function, we introduce the 
evaluation relation c — > l as follows: (S,t \\ P) e — > l e L {{E' ,t' \\ P')) if 
(S, t || P) (S',t' || P'). The relation <-t L guarantees that confidential 
data, i.e., data not below or equal-to level L, is erased as soon as it is 
created. We write for the reflexive and transitive closure of M-l- 
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In order to prove non-interference, we will establish a simulation re- 
lation between and <^* L through the erasure function: erasing all 
secret data and then taking evaluation steps in M-j, is equivalent to tak- 
ing steps in =-> first, and then erasing all secret values in the resulting 
configuration. In the rest of this section, we consider well-typed terms 
to avoid stuck configurations. 

Proposition 1 (Many-step simulation). If (E, sch t s \\ P) M-* 

(E',scht' s || P'), then it holds that e L ((E, sch t s \\ P)) 
e L ((E',scht' s \\P')). 

The L-equivalence relation w £ is an equivalence relation between 
configurations and their parts, defined as the equivalence kernel of the 
erasure function e L : (U, schf s \\ P) k l (Z",schr s || Q) iff 
£l((E, sch t s || P)) = £ L ((Z",schr s || Q)). If two configurations are L- 
equivalent, they agree on all data below or at level L, i.e., an attacker at 
level L is not able to distinguish them. 

The next theorem shows the non-interference property. The configu- 
ration (E, sch []) represents a final configuration, where the thread pool 
is empty and there are no more threads to run. 

Theorem 1 (Termination-insensitive non-interference). Given a compu- 
tation e, inputs c\ and e 2 , an attacker at level L, runtime environments Ei and 
E 2 , then for all inputs e\, e 2 such that e\ ~l c 2 , if (Zi,sch [e ei]) <^->* 
(E[,sch []) and (E 2 , sch [e e 2 \) -V (E' 2 ,sc\i D>> then (E[,sch []) « £ 
(£ 2 ,sch[]). 

This theorem essentially states that if we take two executions from con- 
figurations (Ei, sch [e ei]) and (E 2 , sch [e e 2 ]), which are indistinguish- 
able to an attacker at level L (ei ~l e 2 ), then the final configurations for 
the executions (X^sch []) and (Z^sch []) are also indistinguishable to 
the attacker ((E[, sch []) ~l (E' 2 , sch [])). This result generalizes when 
constructors Done, Atom, and Fork involve exception handling (see Fig- 
ure 5). The reason for this lies in the fact that catch and throw defer all ex- 
ception handling to LIO. throw and LI O. catch, which have been proved 
secure in [36]. 

7 Case study: Classifying location data 

We evaluated the trade-offs between performance, expressiveness and 
security through an LIO case study. We implemented an untrusted ap- 
plication that performs K-means clustering on sensitive user location 
data, in order to classify GPS-enabled cell phone into locations on a map, 
e.g., home, work, gym, etc. Importantly, this app is untrusted yet com- 
putes clusters for users without leaking their location (e.g., the fact that 
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Alice frequents the local chapter of the Rebel Alliance). K-means is a 
particularly interesting application for evaluating our scheduler as the 
classification phase is highly parallelizable — each data point can be eval- 
uated independently. 

We implemented and benchmarked three versions of this app: (i) A 
baseline implementation that does not use our scheduler and parallelizes 
the computation using Haskell's Par Monad [21]. Since in this imple- 
mentation, the scheduler is not modeled using resumptions, it leverages 
the parallelism features of Par. (ii) An implementation in the resump- 
tion based scheduler, but pinned to a single core (therefore not taking 
advantage of parallelizing pure computations), (iii) A parallel imple- 
mentation using the resumption-based scheduler. This implementation 
expresses the exact same computation as the first one, but is not vul- 
nerable to cache-based leaks, even in the face of parallel execution on 
multiple cores. 

We ran each implementation against one month of randomly gener- 
ated data, where data points are collected each minute (so, 43200 data 
points in total). All experiments were run ten times on a machine with 
two 4-core (with hyperthreading) 2.4Ghz Intel Xeon processors and 48GB 
of RAM. The secure, but non-parallel implementation using resump- 
tions performed extremely poorly. With mean 204.55 seconds (standard 
deviation 7.19 seconds), it performed over eight times slower than the 
baseline at 17.17 seconds (standard deviation 1.16 seconds). This was 
expected since K-means is highly parallelizable. Conversely, the paral- 
lel implementation in the resumption based scheduler performed more 
comparably to the baseline, at 17.83 seconds (standard deviation 1.15 
seconds). 

To state any conclusive facts on the overhead introduce by our li- 
brary, it is necessary to perform a more exhaustive analysis involving 
more than a single case study. 

8 Related work 

Cryptosystems Attacks exploiting the CPU cache have been considered 
by the cryptographic community [16]. Our attacker model is weaker 
than the one typically considered in cryptosystems, i.e., attackers with 
access to a stopwatch. As a countermeasure, several authors propose 
partitioning the cache (e.g., [25]), which often requires special hardware. 
Other countermeasures (e.g. [23]) are mainly implementation-specific 
and, while applicable to cryptographic primitives, they do not easily 
generalize to arbitrary code (as required in our scenario). 

Resumptions While CPS can be used to model concurrency in a func- 
tional setting [7], resumptions are often simpler to reason about when 
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considering security guarantees [10, 11]. The closest related work is that 
of Harrison and Hook [11]; inspired by a secure multi-level operating 
system, the authors utilize resumptions to model interleaving and lay- 
ered state monads to represent threads. Every layer corresponds to an 
individual thread, thereby providing a notion of local state. Since we do 
not require such generality, we simply adapt the scheduler to context- 
switch the local state underlying the LIO monad. We believe that au- 
thors overlooked the power of resumptions to deal with timing pertur- 
bations produced by the underlying hardware. In [10], Harrison hints 
that resumptions could handle exceptions; in this work, we consummate 
his claim by describing precicely how to implement throw and catch. 

Language-based IFC There is been considerable amount of literature on 
applying programming languages techniques to address the internal 
timing covert channel (e.g. [28, 33, 35, 39, 41]). Many of these works as- 
sume that the execution of a single step, i.e., a reduction step in some 
transition system, is performed in a single unit of time. This assumption 
is often made so that security guarantees can be easily shown using pro- 
gramming language semantics. Unfortunately, the presence of the CPU 
cache (or other hardware shared state) breaks this correspondence, mak- 
ing cache attacks viable. Our resumption approach establishes a corre- 
spondence between atomic steps at the implementation-level and reduc- 
tion step in a transition system. Previous approaches can leverage this 
technique when implementing systems, as to avoid the reappearance of 
the internal timing channel. 

Agat [2] presents a code transformation for sequential programs such 
that both code paths of a branch have the same memory access pat- 
tern. This transformation has been adapted in different works (e.g., [32]). 
Agat's approach, however, focuses on avoiding attacks relying on the 
data cache, while leaving the instruction cache unattended. 

Russo and Sabelfeld [29] consider non-interference for concurrent 
while-like-programs under cooperative and deterministic scheduling. 
Similar to our work, this approach eliminates cache-attacks by restrict- 
ing the use of yields. Differently, our library targets a richer program- 
ming languages, i.e., it supports parallelism, exceptions, and dynami- 
cally adjusting the granularity of atomic actions. 

Secure multi-execution [8] preserves confidentiality of data by exe- 
cuting the same sequential program several times, one for each secu- 
rity level. In this scenario, cache-based attacks can only be removed in 
specific configurations [14] (e.g., when there are as many CPU cores as 
security levels). 

Hedin and Sands [12] present a type-system for preventing exter- 
nal timing attacks for bytecode. Their semantics is augmented to in- 
corporate history, which enables the modeling of cache effects. Zhang 
et al. [42] provide a method for mitigating external events when their 
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timing behavior could be affected by the underlying hardware. Their 
semantics focusses on sequential programs, wherein attacks due to the 
cache arise in the form of externally visible events. Their solution is di- 
rectly applicable to our system when considering external events. 

System security In order to achieve strong isolation, Barthe et al. [3] 
present a model of virtualization which flushes the cache upon switch- 
ing between guest operating systems. Flushing the cache in such scenar- 
ios is common and does not impact the already-costly context-switch. 
Although this technique addresses attacks that leverage the CPU cache, 
it does not address the case where a shared resource cannot be controlled 
(e.g., CPU bus). 

Allowing some information leakage, Kopft et al. [17] combines ab- 
stract interpretation and quantitative information-flow to analyze leak- 
age bounds for cache attacks. Kim et al. [15] propose StealthMem, a sys- 
tem level protection against cache attacks. StealthMem allows programs 
to allocate memory that does not get evicted from the cache. StealthMem 
is capable of enforcing confidentiality for a stronger attacker model than 
ours, i.e., they consider programs with access to a stopwatch and run- 
ning on multiple cores. However, we suspect that StealthMem is not ad- 
equate for scenarios with arbitrarily complex security lattices, wherein 
not flushing the cache would be overly restricting. 

9 Conclusion 

We present a library for LIO that leverages resumptions to expose con- 
currency. Our resumption-based approach and "instruction"- or atom- 
based scheduling removes internal timing leaks induced by timing per- 
turbations of the underlying hardware. We extend the notion of resump- 
tions to support state and exceptions and provide a scheduler which 
context-switches programs with such features. Though our approach 
eliminates internal-timing attacks that leverage hardware caches, library- 
level threading imposes considerable performance penalties. Address- 
ing this, we provide programmers with a safe mean for controlling the 
context-switching frequency, i.e., allowing for the adjustment of the "size" 
of atomic actions. Moreover, we provide a primitive for spawning com- 
putations in parallel, a novel feature not previously available in IFC 
tools. We prove soundness of our approach and implement a simple 
case study to demonstrate its use. Our techniques can be adapted to 
other Haskell-like IFC systems beyond LIO. The library, case study, and 
details of the proofs can be found at [6]. 
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LAZY PROGRAMS LEAK SECRETS 



Pablo Buiras, Alejandro Russo 



Abstract. To preserve confidentiality, information-flow control re- 
stricts how untrusted code handles secret data. While promising, 
IFC systems are not perfect; they can still leak sensitive informa- 
tion via covert channels. In this work, we describe a novel exploit 
of lazy evaluation to reveal secrets in IFC systems. Specifically, we 
show that lazy evaluation might transport information through 
the internal timing covert channel, a channel present in systems with 
concurrency and shared resources. We illustrate our claim with an 
attack for LIO, a concurrent IFC system for Haskell. We propose a 
countermeasure based on restricting the implicit sharing caused 
by lazy evaluation. 
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1 Introduction 

Information-flow control (IFC) permits untrusted code to safely operate 
on secret data. By tracking how data is disseminated inside programs, 
IFC can avoid leaking secrets into public channels — a policy known as 
non-interference [4] . Despite being promising, IFC systems are not flaw- 
less; the presence of covert channels allows attackers to still leak sensi- 
tive information. 

Covert channels arise when programming language features are mis- 
used to leak information [6]. The tolerance to such channels is deter- 
mined by their bandwidth and how easy it is to exploit them. For in- 
stance, the termination covert channel, which exploits divergence of pro- 
grams, has a different bandwidth in systems with intermediate outputs 
than in batch processes [1]. 

Lazy evaluation is the default evaluation strategy of the purely func- 
tional programming language Haskell. This evaluation strategy has two 
distinctive features which can be used together to reveal secrets. Firstly, 
since it is a form of non-strict evaluation, it delays the evaluation of func- 
tion/constructor arguments and let-bound identifiers until their denoted 
values are needed. Secondly, when the evaluation of such expressions is 
required, their resulting value is stored (cached) for subsequent uses of 
the same expression, a feature known as sharing or memoisation. This is 
known as call-by-need semantics or simply lazy evaluation. In Haskell, a 
thunk, also known as a delayed computation, is a parameterless closure 
created to prevent the evaluation of an expression until it is required 
at a later time. The process of evaluating a thunk is known as forcing. 
While lazy evaluation does not affect the denotation of expressions with 
respect to non-strict semantics, it affects the timing behaviour of pro- 
grams. For instance, if a function argument is used more than once in 
the body of a function, it is almost always faster to use lazy evaluation 
as opposed to call-by-name, since it avoids re-evaluating every occur- 
rence of the argument. 

From a security point of view, it is unclear what type of semantics 
(non-strict versus strict) is desirable in order to deal with covert chan- 
nels. In sequential settings, Sabelfeld and Sands [10] suggest that a non- 
strict semantics might be intrinsically safer than a strict one. This ob- 
servation is based on the ability to exploit the termination covert channel. 
Although it could avoid termination leaks, lazy evaluation can compro- 
mise security in other ways. For instance, Rafsson et al. [9] describe how 
to exploit the Java (lazy) class initialisation process to reveal secrets. Not 
surprinsingly lazy evaluation might also reveal secrets through the ex- 
ternal timing covert channel. This channel involves externally measuring 
the time used to complete operations that may depend on secret data. 

More interestingly, and totally unexplored until this work, lazy eval- 
uation might transport information through the internal timing covert 
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channel. This covert channel arises by the mere presence of concurrency 
and shared resources. Malicious code can exploit it by setting up threads 
to race for a public shared resource and, depending on the secret, affect- 
ing their timing behaviour to determine the winner. With lazy evalua- 
tion in place, thunks become shared resources and forcing their evalua- 
tion corresponds to affecting the threads' timing behaviour — subsequent 
evaluations of previously forced thunks take practically no time. 

We present an attack for LIO [12], a concurrent IFC system for Has- 
kell, that leverages lazy evaluation to leak secrets. LIO presents coun- 
termeasures for internal timing leaks based on programming language 
level abstractions. Since LIO is embedded in Haskell as a library, lazy 
evaluation, as a feature that primarily affects pure values, is handled by 
the host language. Lazy evaluation is essentially built into Haskell's in- 
ternals, hence there are no programming language-level mechanisms for 
inspecting or creating thunks that could be used to implement a coun- 
termeasure. Thunks for pure values are transparently injected into LIO 
computations, so the library could not be capable of explicitly consider- 
ing whether they have been memoised at any given time. 

This paper is organised as follows. Section 2 briefly recaps the ba- 
sics of LIO. Section 3 presents the attack. Section 4 describes a possible 
countermeasure. Conclusions are drawn in Section 5. 

2 LIO: a concurrent IFC system for Haskell 

In purely functional languages, computations with side-effects are en- 
coded as values of abstract data types called monads [8]. In Haskell, 
there are monads for performing inputs and outputs (monad 10), han- 
dling errors (monad Error), etc. The IFC system LIO is simply another 
monad in which security checks are performed before side-effects are 
performed. 

The LIO monad keeps track of a current label. This label is an upper 
bound on the labels of all data in lexical scope. When a computation C, 
with current label L c , observes an object labelled L 0 , C's label is raised 
to the least upper bound or join of the two labels, written L c U L c . Im- 
portantly, the current label governs where the current computation can 
write, what labels may be used when creating new channels or threads, 
etc. For example, after reading an object O, the computation should not 
be able to write to a channel K if L 0 is more confidential than L K — 
this would potentially leak sensitive information (about O) into a less 
sensitive channel. 

Since the current label protects all the variables in scope, in practical 
programs we need a way of manipulating differently-labelled data with- 
out monotonically increasing the current label. For this purpose, LIO 
provides explicit references to labelled, immutable data through a para- 
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attack :: LMVar LH Int -> Labeled LH Int -»■ LIO LH Int 
attack Imv secret 

= do let thunk = [1 . . constant] :: [Int] 

— Thread C 

forkLIO (do s <— unlabel secret 

when (s ^ 0) (do ra traverse thunk 

when (n > 0) (return ()))) 

threadDelay delay_C 

— Thread A 

forkLIO (do n traverse thunk 

when (n > 0) (putLMVar Imv 1)) 

-- Thread B 

forkLIO (do threadDelay delay_B 
putLMVar Imv 0) 

w 4— takeLMVar Imv 
_ ^— takeLMVar Imv 

return w 

Fig. 1: Attack exploiting lazy evaluation 

metric data type called Labeled. A locally accessible symbol can bind, 
for example, a value of type Labeled I Int (for some label type I), which 
contains an Int protected by a label different from the current one. Func- 
tion unlabel :: Labeled I a — »■ a 1 brings the labelled value into the current 
lexical scope and updates the current label accordingly. 

LIO also includes IFC-aware versions of well-established synchroni- 
sation primitives known as MVars [5]. A value of type LMVar is a muta- 
ble location that is either empty or contains a value. Function putLMVar 
fills the LMVar with a value if it is empty and blocks otherwise. Dually, 
readLMVar empties an LMVar if it is full and blocks otherwise. 

3 A lazy attack for LIO 

Figure 1 shows the attack for LIO. The code essentially implements an 
internal timing attack [11] which leverages lazy evaluation to affect the 
timing behaviour of threads. We assume the classic two-point lattice 
(of type LH) where security levels L and H denote public and secret 
data, respectively, and the only disallowed flow is the one from H to 
L. Function attack takes a public, shared LMVar Imv, and a labelled 
boolean secret (encoded as an integer for simplicity). The goal of attack 



1 Symbol :: introduces type declarations and — ¥ denotes function types. 
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is to return a public integer equal to secret, thus exposing an LIO vul- 
nerability. In isolation, all the threads are secure. When executed con- 
currently however, secret gets leaked into Imv. For simplicity, we use 
threadDelay n, which causes a thread to sleep for n micro seconds, to 
exploit the race to Imv — if such an operation was not allowed, using a 
loop would work equally well. 

The attack proceeds as follows. Threads A and B do not start running 
until thread C finishes. This effect can be easily achieved by adjusting 
the parameter delay _C. The role of thread C is to force the evaluation of 
the list thunk when the value of secret is not zero (s ^ 0). To that end, 
function traverse goes over thunk, returning one of its elements. Con- 
dition n > 0 always holds and it is only used to force Haskell to fully 
evaluate the closure returned by traverse. Threads A and B will even- 
tually start racing. Thread A executes the command traverse thunk be- 
fore writing the constant 1 into Imv (putLMVar Imv 1). Thread B delays 
writing 0 into Imv {putLMVar Imv 0) by some (carefully chosen) time 
delay _B. If s ^ 0, thunk will have already been evaluated when thread 
A traverses its elements, thus taking less time than thread B's delay. As 
a result, value 1 is first written into Imv. Otherwise, thread B's delay is 
shorter than the time taken by thread A to force the evaluation of thunk. 
In this case, value 0 is first written into Imv. Variable w observes the first 
written value in Imv, which will coincide with the value of the secret. 
The precise values of parameters constant, delay _C , and delay _B are 
machine-specific and experimentally determined. 

The following code shows the magnification of the attack for a list of 
secret integers. 

magnify :: [Labeled LH Int] — > LIO LH [Int] 
magnify ss = do Imv <— newEmptyLMVar L 
mapM (attack Imv) ss 

Function magnify takes a list of secret values ss (of type [ Labeled LH Int]). 
The magnification proceeds by creating the public LMVar 
(newEmptyLMVar L) needed by the attack. Function mapM sequentially 
applies function attack Imv (i.e. the attack) to every element in ss and 
collects the results in a public list ([Int]). 

Below, we present the final component required for the attack: 

traverse :: [a] — > LIO LH a 
traverse xs = return (last xs) 

This function simply returns the last element of the list given as argu- 
ment. 

The code for the attack can be downloaded f rom h 1 1 p : / / www . c s e . 

Chalmers . se/~buiras/LazyAttack . tar . gz. 
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4 Restricting sharing 

We propose a countermeasure based on restricting the sharing feature 
of lazy evaluation. Specifically, we propose duplicating shared thunks 
when spawning new threads. In that manner, sharing gets restricted to 
the lexical scope of each thread. Thunks being forced in one thread will 
then not affect the timing behaviour of the others. To illustrate this point, 
consider the shared thunk from Figure 1 . If this countermeasure was im- 
plemented, forcing the evaluation of thunk by thread C would not affect 
the time taken by thread A to evaluate traverse thunk, making the attack 
no longer possible. An important drawback of this approach is that there 
would be a performance penalty incurred by disabling sharing among 
threads. Benchmarking and evaluation would be necessary to determine 
the full extent of the overhead inherent in the technique. Presumably, 
programmers could restructure their programs to minimise the effect of 
this penalty. 

As an optimisation, it is possible to only duplicate thunks denot- 
ing pure expressions. Thunks denoting side-effecting expressions can 
be shared across threads without jeopardising security. The reason for 
that relies on LIO's ability to monitor side-effects. If a thread that de- 
pends on the secret forces the evaluation of side-effecting computations, 
the resulting side-effects are required to agree with the IFC policy. For 
instance, threads with secrets in lexical scope can only force thunks that 
perform no public side-effects; otherwise LIO will abort the execution 
in order to preserve confidentiality. 

To implement our approach, we propose using deepDup, an oper- 
ation introduced by Joachim Breitner [2] to prevent sharing in Haskell. 
Essentially, deepDup takes a variable as its argument and creates a pri- 
vate copy of the whole heap reachable from it, effectively duplicating 
the argument thunk and disabling sharing between it and the original 
thunk. In his paper, Breitner shows how to extend Launchbury's natural 
semantics for lazy evaluation [7] with deepDup. The natural semantics 
is given by a relation r : t JJ. A : v, which represents the fact that from 
the heap r we can reduce term t to the value v, producing a new heap 
A. It is the relation between r and A which captures heap modifications 
caused by memoisation. In this setting, the rule for deepDup is 

r,x^ e,x' ^ e\y' x lyx. ■ ■ .,y' n /y n ], (y- ^ deepDup yi) ie i... n :x'$A\z 

ufv(e) = {yi,...,y n } x' ,y[, . . . ,y' n fresh 

r, x i->- e : deepDup x JJ. A : z 

where ufv(e) is the set of unguarded 2 free variables of e and e is e with 
all bound variables renamed to fresh variables in order to avoid vari- 

2 Function ufv(e) is defined as the set of free variables that are not already 
marked for duplication, i.e. ufv(deepDup x) = 0, and in the rest of the cases 
it is inductively defined as usual. 
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able capture when applying substitutions. Note that deepDup x dupli- 
cates all the thunks reachable from a; in a lazy manner: the free variables 
yi, . . . , y n are replaced with calls to deepDup for each variable, so these 
duplications will not be performed until those variables are actually 
evaluated. Laziness is necessary to properly handle cyclic data struc- 
tures, since the duplication process would loop indefinitely if it were 
to eagerly copy all thunks for such structures. As explained below, this 
design decision has important consequences for security. 

In practice, we would use this primitive every time we fork a new 
thread: we take the body of the new thread mi and the body of the parent 
thread m^, and replace them with deepDup m\ and deepDup 7712. Due 
to the lazy nature of the duplication performed by deepDup, it is nec- 
essary to duplicate both thunks, i.e., m\ and m 2 . Consider two threads A 
and B with current labels L and H, respectively, and suppose that they 
both have a pointer to a certain thunk x in the same scope. If we only 
duplicated the thunk in A (the public thread), thread B could evaluate 
parts of x depending on the secret, before they have been duplicated in 
thread A — recall that deepDup is lazy. This would cause the evaluation 
of the same parts of the duplicated version of a; in A to go faster, thus 
conveying some information about the secret to thread A. In addition, 
note that it is not possible to determine in advance — at the time forkLIO 
is called — which thread will raise its current label to H. Therefore, we 
must take care to duplicate all further references to shared thunks every 
time a fork occurs. 

As a possible optimisation, we advise designing a data dependency 
analysis capable of over-approximating which expressions are shared 
among threads. Once the list of expressions (and their scope) has been 
calculated, we would proceed to instrument the code, introducing in- 
structions that duplicate only the truly shared thunks at runtime, as op- 
posed to duplicating every pure thunk in the body of each thread. We 
believe that HERMIT [3] is an appropriate tool to deploy such instru- 
mentation as a code-to-code transformation. 

5 Conclusions 

We describe and implement a new way of leveraging lazy evaluation to 
leak secrets in LIO, a concurrent IFC system in Haskell. Beyond LIO, 
the attack points out a subtlety of IFC for programming languages with 
lazy semantics and concurrency. We propose a countermeasure based 
on duplicating thunks at the time of forking in order to restrict sharing 
among threads. For that, we propose to use the experimental Haskell 
package ghc-dup. This package provides operations that copy thunks in 
a lazy manner. Although convenient for preserving program semantics, 
such design decision has implications for security. To deal with that, our 
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solution requires duplicating thunks for both the newly spawned thread 
and its parent. As future work, we will implement the proposed coun- 
termeasure, prove soundness (non-interference), evaluate its applicabil- 
ity through different case studies, and introduce some optimisations to 
reduce the amount of duplicated thunks. 
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